^* These authors contributed equally to this work.

¹ Barcelona Institute for Global Health.
² Department of Psychiatry and Psychotherapy, Heidelberg University.
³ Univ Rennes, Inserm, EHESP, Irset.

^✉ Correspondence: Augusto Anguita-Ruiz <augusto.anguita@isglobal.org>

1 Introduction.

The current report serves as an online companion for the manuscript “To be supplied [”Title”. Author X et al. 2022.]”. With this analysis, we demonstrate the suitability of our method “dsLassoCov” to perform feature-selection in a real scenario with complex exposome data and under the privacy-protected federated DataSHIELD system.

2 Methods.

In this section, we will describe the data employed for the showcase, giving details about the experimental design, the type of features available, and the structure and organization of the data in the Opal infrastructure.

2.1 HELIX project.

The research dataset employed here derives from the HELIX (Human Early-Life Exposome) Project. The HELIX project gathers data from 6 longitudinal-based European birth cohorts with the aim of evaluating the effect of environmental risk factors on mothers’s and children’s health. HELIX cohorts include the BIB (Born in Bradford) (United Kingdom), EDEN (Étude des Déterminants pré et postnatals du développement et de la santé de l’ENfant) (France), INMA (INfancia y Medio Ambiente) (Spain), KANC (Kaunus Cohort) (Lithuania), MoBa (Norwegian Mother and Child Cohort Study) (Norway), and Rhea (Mother-Child Cohort in Crete) (Greece). General details of the study design can be found in the Figure 1. The whole HELIX dataset includes a total of 31,472 mother-child pairs. Among them, a subcohort of 1,298 children (approximately 200 children in each cohort) was selected in this study according to the following criteria of eligibility: 1) age 6 to 11 years at the moment of outcome evaluation; 2) complete address history; and 3) no serious health problems that may affect the clinical testing or the child safety.

Figure 1. General overview of the HELIX research project.

In the 1,301 children, a wide range of environmental exposures were evaluated to define the early-life exposome during two time periods: the prenatal pregnancy period and postnatal period (childhood age 6 to 11 years). Collected exposures comprise the three main parts of the exposome: outdoor exposures, chemical exposures, and lifestyle factors. All variables incorporated in the dataset have been appropriately pre-processed previous to analysis (normalized and scaled, outliers removed, and missing values imputated). Regarding phenotype data, a wide range of health outcomes are also available for the postnatal (childhood) period including outcomes related to (1) obesity and cardiometabolic health, (2) respiratory health, and (3) cognition and mental health.

HELIX Exposome data included information for 208 exposures and 88 phenotypes (Table 1).

Table 1. General description for the HELIX exposome dataset.
Cohort	N Individuals	N Predictors	N Phenotypes
BIB_cohort	202	208	88
EDEN_cohort	198	208	88
INMA-SAB_cohort	223	208	88
KANC_cohort	204	208	88
MoBA_cohort	272	208	88
Rhea_cohort	199	208	88
Combined Studies	1298	208	88

2.2 Experimental Set-Up and research hypothesis.

The outcome of interest in the present study was the blood pressure at the postnatal period (childhood). Specifically, systolic and diastolic blood pressure values (SBP and DBP). These outcomes have been previously investigated in this population and the main findings of the research can be found in (Warembourg et al. 2019) and (Warembourg et al. 2021). In this showcase, available exposure data comprised 115 postnatal and 93 prenatal variables, respectively. Among them, we focused only in continuous exposures (77 postnatal and 69 prenatal, respectively). A separate model was run for each outcome (SBP and DBP) and each time point (prenatal and postnatal).

Nine confounders were identified from a Directed Acyclic Graph (Figure 2), including cohort of recruitment, maternal age (continuous in years), maternal educational level (low, middle, or high), self-reported maternal pre-pregnancy body mass index (continuous in kg/m2), parity (nulliparous, primiparous, or multiparous), native (if the child family is native from country of recruitment), and child age, height and sex. Cohort of recruitment was treated as six dummy variables (one for each population), and only five of them were finally included as confounders.

Figure 2. Direct Acyclic Graph design to identified confounder in the association between the early-life exposome and child blood pressure.

2.2.1 Data organization in the Opal server.

The dataset was uploaded to the Opal BRGE site hosted by the Bioinformatic Research Group in Epidemiology of ISGlobal, simulating a single-site DataSHIELD infrastructure (Figure 3). Details for accessing the server can be found in the respective DataSHIELD analysis section. Data were stored under the form of .csv (one per cohort, outcome and time point), and can be easily loaded as tables into the DataSHIELD environment. A summary of the structure and organization of the data in the Opal for the HELIX project is illustrated in Figure 3.

Figure 3. Input HELIX dataset organized by cohort in the Opal server.

2.3 Data loading and preparation in DataSHIELD.

In this section, we show how to load the data into the DataSHIELD environment. We start by creating the connection to the opal server using an user, who have DataSHIELD permissions to Opal servers. Please, note that in our example, all datasets are hosted in the same Opal but each cohort sub-dataset is accessed separately.

# Loading DataSHIELD required packages
require(DSOpal)
require(DSI)
require(dsBaseClient)
require(dsExposomeClient)
require(dsMTLClient)
require(dsHelper)

# Loading additional required packages
require(dplyr)
require(ggplot2)
require(ggrepel)
require(tidyverse)
require(ggrepel)
require(reshape2)
require(RColorBrewer)
require(kableExtra)
require(grid)
require(gridExtra)
require(lattice)
require(ggpubr)

# Create connections
builder <- DSI::newDSLoginBuilder()
builder$append(server = "BIB", url = "https://opal.isglobal.org/repo",
               user =  "invited", password = "12345678Aa@",
               profile = "rock-lasso")
builder$append(server = "EDEN", url = "https://opal.isglobal.org/repo",
               user =  "invited", password = "12345678Aa@",
               profile = "rock-lasso")
builder$append(server = "KANC", url = "https://opal.isglobal.org/repo",
                user =  "invited", password = "12345678Aa@",
                profile = "rock-lasso")
builder$append(server = "MoBA", url = "https://opal.isglobal.org/repo",
               user =  "invited", password = "12345678Aa@",
               profile = "rock-lasso")
builder$append(server = "Rhea", url = "https://opal.isglobal.org/repo",
               user =  "invited", password = "12345678Aa@",
               profile = "rock-lasso")
builder$append(server = "INMASAB", url = "https://opal.isglobal.org/repo",
               user =  "invited", password = "12345678Aa@",
               profile = "rock-lasso")
logindata <- builder$build()
conns <- DSI::datashield.login(logins = logindata)

Then, we can load all the .csv tables available in the Opal server, corresponding to each cohort/phenotype, using the DSI::datashield.assign.table() function. This function takes the connections to the server created in the previous code chunk to assign all available objects from the the Opal, to an R object in the DataSHIELD remote session.

# We assign the resources for the DBP outcome
DSI::datashield.assign.table(conns[1], "dataDBP_PRE", "HELIX.data1DBP_pre", async = FALSE)
DSI::datashield.assign.table(conns[2], "dataDBP_PRE", "HELIX.data2DBP_pre", async = FALSE)
DSI::datashield.assign.table(conns[3], "dataDBP_PRE", "HELIX.data3DBP_pre", async = FALSE)
DSI::datashield.assign.table(conns[4], "dataDBP_PRE", "HELIX.data4DBP_pre", async = FALSE)
DSI::datashield.assign.table(conns[5], "dataDBP_PRE", "HELIX.data5DBP_pre", async = FALSE)
DSI::datashield.assign.table(conns[6], "dataDBP_PRE", "HELIX.data6DBP_pre", async = FALSE)

DSI::datashield.assign.table(conns[1], "dataDBP_POS", "HELIX.data1DBP_POS", async = FALSE)
DSI::datashield.assign.table(conns[2], "dataDBP_POS", "HELIX.data2DBP_POS", async = FALSE)
DSI::datashield.assign.table(conns[3], "dataDBP_POS", "HELIX.data3DBP_POS", async = FALSE)
DSI::datashield.assign.table(conns[4], "dataDBP_POS", "HELIX.data4DBP_POS", async = FALSE)
DSI::datashield.assign.table(conns[5], "dataDBP_POS", "HELIX.data5DBP_POS", async = FALSE)
DSI::datashield.assign.table(conns[6], "dataDBP_POS", "HELIX.data6DBP_POS", async = FALSE)


# We assign the resources for the SBP outcome
DSI::datashield.assign.table(conns[1], "dataSBP_PRE", "HELIX.data1SBP_PRE", async = FALSE)
DSI::datashield.assign.table(conns[2], "dataSBP_PRE", "HELIX.data2SBP_PRE", async = FALSE)
DSI::datashield.assign.table(conns[3], "dataSBP_PRE", "HELIX.data3SBP_PRE", async = FALSE)
DSI::datashield.assign.table(conns[4], "dataSBP_PRE", "HELIX.data4SBP_PRE", async = FALSE)
DSI::datashield.assign.table(conns[5], "dataSBP_PRE", "HELIX.data5SBP_PRE", async = FALSE)
DSI::datashield.assign.table(conns[6], "dataSBP_PRE", "HELIX.data6SBP_PRE", async = FALSE)

DSI::datashield.assign.table(conns[1], "dataSBP_POS", "HELIX.data1SBP_POS", async = FALSE)
DSI::datashield.assign.table(conns[2], "dataSBP_POS", "HELIX.data2SBP_POS", async = FALSE)
DSI::datashield.assign.table(conns[3], "dataSBP_POS", "HELIX.data3SBP_POS", async = FALSE)
DSI::datashield.assign.table(conns[4], "dataSBP_POS", "HELIX.data4SBP_POS", async = FALSE)
DSI::datashield.assign.table(conns[5], "dataSBP_POS", "HELIX.data5SBP_POS", async = FALSE)
DSI::datashield.assign.table(conns[6], "dataSBP_POS", "HELIX.data6SBP_POS", async = FALSE)

Since dsLassoCov functions require data of each cohort to be passed as separate matrices for predictors and the outcome, we still needed to do some additional data preparation (i.e., predictor data were assigned to a matrix called “X”, and the outcome to an object called “Y”).

# Checks on the DBP dataset
ds.class("dataDBP_POS")
ds.dim("dataDBP_POS")

# Checks on the SBP dataset
ds.class("dataSBP_POS")
ds.dim("dataSBP_POS")

# Assign outcome data to a separate object called Y
ds.assign(toAssign='dataDBP_POS$Y', newobj='Y_DBP', datasources = conns)
ds.assign(toAssign='dataSBP_POS$Y', newobj='Y_SBP', datasources = conns)

# Create a vector with all ones
ds.make(toAssign = "Y_DBP-Y_DBP+1",newobj = "ONES",datasources = conns)

# Select predictors and assign them to a separate object called X (we will exclude here one of the cohorts dummy variables (Cohort EDEN) to avoid perfect collinearity between predictors)
ds.dataFrameSubset(df.name = 'dataDBP_PRE',  V1.name = "ONES",  V2.name = "ONES",  Boolean.operator = "==",keep.cols = c(1,3:84),
                   newobj = 'X_DBP_PRE',  datasources = conns)
ds.dataFrameSubset(df.name = 'dataDBP_POS',  V1.name = "ONES",  V2.name = "ONES",  Boolean.operator = "==",keep.cols = c(1,3:92),
                   newobj = 'X_DBP_POS',  datasources = conns)
ds.dataFrameSubset(df.name = 'dataSBP_PRE',  V1.name = "ONES",  V2.name = "ONES",  Boolean.operator = "==",keep.cols = c(1,3:84),
                   newobj = 'X_SBP_PRE',  datasources = conns)
ds.dataFrameSubset(df.name = 'dataSBP_POS',  V1.name = "ONES",  V2.name = "ONES",  Boolean.operator = "==",keep.cols = c(1,3:92),
                   newobj = 'X_SBP_POS',  datasources = conns)

# Checks on created files
ds.dim('X_DBP_PRE')
ds.dim('X_DBP_POS')
ds.dim('X_SBP_PRE')
ds.dim('X_SBP_POS')

ds.length('Y_DBP')
ds.length('Y_SBP')

# See covariates names:
ds.colnames('X_DBP_PRE')[[1]][1:13]
ds.colnames('X_SBP_POS')[[1]][1:13]

# We coerce both Xs and Y objects into matrix-type objects
ds.asMatrix(x.name = 'X_DBP_PRE', newobj = 'X_DBP_PRE', datasources = conns)
ds.asMatrix(x.name = 'X_DBP_POS', newobj = 'X_DBP_POS', datasources = conns)
ds.asMatrix(x.name = 'X_SBP_PRE', newobj = 'X_SBP_PRE', datasources = conns)
ds.asMatrix(x.name = 'X_SBP_POS', newobj = 'X_SBP_POS', datasources = conns)

ds.asMatrix(x.name = 'Y_DBP', newobj = 'Y_DBP', datasources = conns)
ds.asMatrix(x.name = 'Y_SBP', newobj = 'Y_SBP', datasources = conns)

# We assign objects from the remote session to R objects 
# in the client-side
X_DBP="X_DBP_PRE"; X_DBP="X_DBP_POS"; Y_DBP="Y_DBP"
X_SBP="X_SBP_PRE"; X_SBP="X_SBP_POS"; Y_SBP="Y_SBP"

3 Results.

3.1 Lasso regression on Blood pressure: estimation of optimal hyperparameters.

The first step of the analysis involved performing a 5-folds cross-validation procedure over a sequence of 50 lambda values (estimated from the data), for the identification of the optimal lambda hyperparameter, which was selected according to MSE criterion.

# The "opts" parameter allows controlling the optimization algorithm employed to minimize the sum of squared errors (SSE) (objective function) during the coefficients estimation
opts=list(); opts$init=0; opts$maxIter=100; opts$tol=0.001; opts$ter=2;

# Identification of the optimal lambda value by k-fold cross-validation for each outcome
cvResult_DBP_PRE <- ds.LassoCov_CVInSite(X=X_DBP_PRE, Y=Y_DBP, type="regress", nlambda=50, lam_ratio=0.01, 
                              opts=opts, covar=c(1:13), datasources=conns, nDigits=4, 
                              nfolds=5)
cvResult_DBP_POS <- ds.LassoCov_CVInSite(X=X_DBP_POS, Y=Y_DBP, type="regress", nlambda=50, lam_ratio=0.01, 
                              opts=opts, covar=c(1:13), datasources=conns, nDigits=4, 
                              nfolds=5)
cvResult_SBP_PRE <- ds.LassoCov_CVInSite(X=X_SBP_PRE, Y=Y_SBP, type="regress", nlambda=50, lam_ratio=0.01, 
                              opts=opts, covar=c(1:13), datasources=conns, nDigits=4, 
                              nfolds=5)
cvResult_SBP_POS <- ds.LassoCov_CVInSite(X=X_SBP_POS, Y=Y_SBP, type="regress", nlambda=50, lam_ratio=0.01, 
                              opts=opts, covar=c(1:13), datasources=conns, nDigits=4, 
                              nfolds=5)

cvResult_DBP_PRE$lambda.min

[1] 0.1263654

cvResult_DBP_POS$lambda.min

[1] 0.01969534

cvResult_SBP_PRE$lambda.min

[1] 0.04561056

cvResult_SBP_POS$lambda.min

[1] 0.02002656

DBP: optimal lambda was 0.13 for prenatal and 0.02 for postnatal.

# PRENATAL
# Prepare model results for plot
dataset <- na.omit(reshape2::melt(cvResult_DBP_PRE$mse_fold))
LambdaVal <- round(apply(cvResult_DBP_PRE$lam_seq,2,mean),2)
LambdaVal <- paste(unique(dataset$Var2),"=",LambdaVal)

# Set range of colors for lambdas
fun_color_range <- colorRampPalette(c("#08737f", "#f95559"))   # Apply colorRampPalette
my_colors <- fun_color_range(50)  

# Boxplot showing the averaged MSE obtained for lambda values over folds
G1 <- ggplot(dataset,aes(x=Var1,y=value,fill=Var2))+geom_boxplot()+ labs(x = "Averaged lambda over folds",
y = "Mean squared error", fill='Lambda value') + scale_fill_manual(values=my_colors,labels = LambdaVal) + theme(legend.position = "none")+ ggtitle("DBP Prenatal")


# POSTNATAL
# Prepare model results for plot
dataset <- na.omit(reshape2::melt(cvResult_DBP_POS$mse_fold))
LambdaVal <- round(apply(cvResult_DBP_POS$lam_seq,2,mean),2)
LambdaVal <- paste(unique(dataset$Var2),"=",LambdaVal)

# Set range of colors for lambdas
fun_color_range <- colorRampPalette(c("#08737f", "#f95559"))   # Apply colorRampPalette
my_colors <- fun_color_range(50)  

# Boxplot showing the averaged MSE obtained for lambda values over folds
G2 <- ggplot(dataset,aes(x=Var1,y=value,fill=Var2))+geom_boxplot()+ labs(x = "Averaged lambda over folds",
y = "Mean squared error", fill='Lambda value') + scale_fill_manual(values=my_colors,labels = LambdaVal) + theme(legend.position = "none")+ ggtitle("DBP Postnatal")

SBP: optimal lambda was 0.05 for prenatal and 0.02 for postnatal.

# PRENATAL
# Prepare model results for plot
dataset <- na.omit(reshape2::melt(cvResult_SBP_PRE$mse_fold))
LambdaVal <- round(apply(cvResult_SBP_PRE$lam_seq,2,mean),2)
LambdaVal <- paste(unique(dataset$Var2),"=",LambdaVal)

# Set range of colors for lambdas
fun_color_range <- colorRampPalette(c("#08737f", "#f95559"))   # Apply colorRampPalette
my_colors <- fun_color_range(50)  

# Boxplot showing the averaged MSE obtained for lambda values over folds
G3 <- ggplot(dataset,aes(x=Var1,y=value,fill=Var2))+geom_boxplot()+ labs(x = "Averaged lambda over folds",
y = "Mean squared error", fill='Lambda value') + scale_fill_manual(values=my_colors,labels = LambdaVal) + theme(legend.position = "none")+ ggtitle("SBP Prenatal")

# POSTNATAL
# Prepare model results for plot
dataset <- na.omit(reshape2::melt(cvResult_SBP_POS$mse_fold))
LambdaVal <- round(apply(cvResult_SBP_POS$lam_seq,2,mean),2)
LambdaVal <- paste(unique(dataset$Var2),"=",LambdaVal)

# Set range of colors for lambdas
fun_color_range <- colorRampPalette(c("#08737f", "#f95559"))   # Apply colorRampPalette
my_colors <- fun_color_range(50)  

# Boxplot showing the averaged MSE obtained for lambda values over folds
G4 <- ggplot(dataset,aes(x=Var1,y=value,fill=Var2))+geom_boxplot()+ labs(x = "Averaged lambda over folds",
y = "Mean squared error", fill='Lambda value') + scale_fill_manual(values=my_colors,labels = LambdaVal) + theme(legend.position = "none")+ ggtitle("SBP Postnatal")

ggarrange(G1, G2, G3, G4, ncol = 2, nrow = 2, labels=c("A","B","C","D"))

3.2 Optimal models: Diastolic Blood Pressure.

The optimal lambda was 0.13 for the prenatal and 0.02 for the postnatal period. Predictors 1 to 13 in the dataset were defined as the adjusting covariates.

# Solver of Lasso Regression
OptimalModel_DBP_PRE <-ds.LassoCov_Train(X=X_DBP_PRE, Y=Y_DBP,type = "regress", lambda=cvResult_DBP$lambda.min, covar=c(1:13), opts=opts, datasources=conns, nDigits=15)
names(OptimalModel_DBP_PRE$ws) <- ds.colnames("X_DBP_PRE")[[1]]
OptimalModel_DBP_POS <- ds.LassoCov_Train(X=X_DBP_POS, Y=Y_DBP,type = "regress", lambda=cvResult_DBP$lambda.min, covar=c(1:13), opts=opts, datasources=conns, nDigits=15)
names(OptimalModel_DBP_POS$ws) <- ds.colnames("X_DBP_POS")[[1]]

Below, the number and additional details for selected predictors in each model are shown.

# PRENATAL
# Get the number of selected variables
sum(OptimalModel_DBP_PRE$ws!=0)

[1] 13

# Show estimated coefficients and additional metadata for selected variables
toplot <- data.frame(OptimalModel_DBP_PRE$ws,ds.colnames("X_DBP_PRE")[[1]])
colnames(toplot) <- c("Coefficient","Index")
toplot_ <- codebook_vars[toplot[,2],c("Period","Group")]
toplot_save <- cbind(toplot,toplot_)
toplot_save <- toplot_save[which(toplot_save$Coefficient!=0),c(2,1,3,4)]
rownames(toplot_save) <- NULL
knitr::kable(toplot_save[order(abs(toplot_save$Coefficient),decreasing=T),], caption = "DBP Prenatal Optimal model") %>% row_spec(0,bold=TRUE) %>% kable_styling()

DBP Prenatal Optimal model
	Index	Coefficient	Period	Group
1	h_cohort_BIB	0.4702208	Pregnancy	Key Covariates
4	h_cohort_RHEA	-0.3599177	Pregnancy	Key Covariates
3	h_cohort_MOBA	-0.1713993	Pregnancy	Key Covariates
5	h_cohort_INMA	0.1677149	Pregnancy	Key Covariates
10	h_native_None	-0.0888654	Pregnancy	Key Covariates
13	hs_c_height_None	0.0883577	Postnatal	Key Covariates
6	h_age_None	0.0602509	Pregnancy	Key Covariates
8	h_mbmi_None	0.0496987	Pregnancy	Key Covariates
7	h_edumc_None	-0.0483261	Pregnancy	Key Covariates
12	hs_child_age_days_None	0.0466813	Postnatal	Key Covariates
9	h_parity_None	-0.0406308	Pregnancy	Key Covariates
2	h_cohort_KANC	-0.0338111	Pregnancy	Key Covariates
11	e3_sex_None	-0.0195808	Pregnancy	Key Covariates

toplot$vargroup <- rep(NA,nrow(toplot))
toplot$vargroup[which(toplot$Coefficient > 0)] <- "Positive"
toplot$vargroup[which(toplot$Coefficient == 0)] <- "Null"
toplot$vargroup[which(toplot$Coefficient < 0)] <- "Negative"

G5 <- ggplot(toplot, aes(x=Index, y=Coefficient, group = vargroup, label=Index)) +
  geom_point(size=2,aes(color = vargroup)) +
  scale_color_manual(values=c("#CD534CFF","black","#227CAD")) +
  geom_hline(yintercept=0, linetype='dashed', col = "#868686FF") +
  geom_label_repel(data         = subset(toplot, toplot$vargroup!="Null" ),
                   size          = 2,
                   box.padding   = 0.5,
                   point.padding = 0.5,
                   force         = 100,
                   segment.size  = 0.2,
                   segment.color = "grey50", max.overlaps = 8
                   ) +
  theme(text = element_text(size=6), axis.text.x = element_text(angle = 90),legend.position = "none") + ggtitle("DBP Prenatal Optimal model") + xlab("Exposures") + ylab("Coefficients")

# POSTNATAL
# Get the number of selected variables
sum(OptimalModel_DBP_POS$ws!=0)

[1] 43

# Show estimated coefficients and additional metadata for selected variables
toplot <- data.frame(OptimalModel_DBP_POS$ws,ds.colnames("X_DBP_POS")[[1]])
colnames(toplot) <- c("Coefficient","Index")
toplot_ <- codebook_vars[toplot[,2],c("Period","Group")]
toplot_save <- cbind(toplot,toplot_)
toplot_save <- toplot_save[which(toplot_save$Coefficient!=0),c(2,1,3,4)]
rownames(toplot_save) <- NULL
knitr::kable(toplot_save[order(abs(toplot_save$Coefficient),decreasing=T),], caption = "DBP Postnatal Optimal model") %>% row_spec(0,bold=TRUE) %>% 
kable_styling()

DBP Postnatal Optimal model
	Index	Coefficient	Period	Group
1	h_cohort_BIB	0.4518531	Pregnancy	Key Covariates
4	h_cohort_RHEA	-0.3514934	Pregnancy	Key Covariates
5	h_cohort_INMA	0.1744509	Pregnancy	Key Covariates
3	h_cohort_MOBA	-0.1676391	Pregnancy	Key Covariates
10	h_native_None	-0.0866137	Pregnancy	Key Covariates
13	hs_c_height_None	0.0756864	Postnatal	Key Covariates
6	h_age_None	0.0661424	Pregnancy	Key Covariates
28	hs_mg_c_Log2	0.0589452	Postnatal	Essential minerals
20	hs_cu_c_Log2	0.0537502	Postnatal	Metals
43	PSS_4_Score_None	0.0433349	Postnatal	Others
12	hs_child_age_days_None	0.0396376	Postnatal	Key Covariates
25	hs_mbzp_cadj_Log2	-0.0390572	Postnatal	Phthalates
9	h_parity_None	-0.0387779	Pregnancy	Key Covariates
7	h_edumc_None	-0.0380214	Pregnancy	Key Covariates
31	hs_na_c_Log2	-0.0355180	Postnatal	Essential minerals
27	hs_mep_cadj_Log2	-0.0347788	Postnatal	Phthalates
8	h_mbmi_None	0.0325115	Pregnancy	Key Covariates
21	hs_dde_cadj_Log2	-0.0315272	Postnatal	OCs
2	h_cohort_KANC	-0.0313477	Pregnancy	Key Covariates
41	hs_tm_mt_hs_h_None	-0.0302424	Postnatal	Meteorological
16	hs_as_c_Log2	-0.0288183	Postnatal	Metals
17	hs_bupa_cadj_Log2	-0.0270073	Postnatal	Phenols
11	e3_sex_None	-0.0236309	Pregnancy	Key Covariates
35	hs_pbde47_cadj_Log2	0.0209138	Postnatal	PBDEs
34	hs_pbde153_cadj_Log2	-0.0202860	Postnatal	PBDEs
40	hs_pm25abs_yr_hs_h_Log	0.0145727	Postnatal	Air Pollution
42	hs_trcs_cadj_Log2	0.0132693	Postnatal	Phenols
39	hs_pfos_c_Log2	0.0116524	Postnatal	PFASs
22	hs_dmp_cadj_Log2	-0.0110715	Postnatal	OP Pesticides
26	hs_mehp_cadj_Log2	-0.0107084	Postnatal	Phthalates
32	hs_oxominp_cadj_Log2	0.0104273	Postnatal	Phthalates
15	hs_accesspoints300_h_Log	-0.0086248	Postnatal	Built Environment
14	h_Absorbance_Log	0.0081270	Postnatal	Indoor air
37	hs_pcb138_cadj_Log2	-0.0069165	Postnatal	OCs
23	hs_etpa_cadj_Log2	0.0063827	Postnatal	Phenols
29	hs_mo_c_Log2	-0.0032549	Postnatal	Metals
19	hs_co_c_Log2	-0.0030013	Postnatal	Metals
18	hs_cd_c_Log2	0.0027984	Postnatal	Metals
36	hs_pcb118_cadj_Log2	-0.0024705	Postnatal	OCs
30	hs_mvpa_prd_alt_None	0.0015177	Postnatal	Lifestyle
24	hs_k_c_Log2	0.0012254	Postnatal	Essential minerals
38	hs_pcb170_cadj_Log2	-0.0004701	Postnatal	OCs
33	hs_pb_c_Log2	-0.0001753	Postnatal	Metals

toplot$vargroup <- rep(NA,nrow(toplot))
toplot$vargroup[which(toplot$Coefficient > 0)] <- "Positive"
toplot$vargroup[which(toplot$Coefficient == 0)] <- "Null"
toplot$vargroup[which(toplot$Coefficient < 0)] <- "Negative"

G6 <- ggplot(toplot, aes(x=Index, y=Coefficient, group = vargroup, label=Index)) +
  geom_point(size=2,aes(color = vargroup)) +
  scale_color_manual(values=c("#CD534CFF","black","#227CAD")) +
  geom_hline(yintercept=0, linetype='dashed', col = "#868686FF") +
  geom_label_repel(data         = subset(toplot, toplot$Coefficient > quantile(abs(toplot$Coefficient[toplot$Coefficient>0]),0.15) | toplot$Coefficient < -quantile(abs(toplot$Coefficient[toplot$Coefficient<0]) ,0.45)),
                   size          = 2,
                   box.padding   = 0.5,
                   point.padding = 0.5,
                   force         = 100,
                   segment.size  = 0.2,
                   segment.color = "grey50", max.overlaps = 8
                   ) +
  theme(text = element_text(size=6), axis.text.x = element_text(angle = 90),legend.position = "none") + ggtitle("DBP Postnatal Optimal model") + xlab("Exposures") + ylab("Coefficients")


ggarrange(G5, G6, ncol = 2, labels=c("A","B"))

The number of predictors remaining in the final models was 13 (only confounders) for the prenatal period, and 43 (13 confounders + 30 exposures) for the postnatal period. Intriguingly, in the prenatal period, a model with only adjusting covariates was enough to explain all the variance in DBP. On the other hand, in the postnatal period, the model evidenced how exposure to metals such as copper, or chemicals like PFAS and phenols was associated with higher blood pressure levels in children. Likewise, exposure to a high-stress environment (measured as perceived stress score) was associated with higher DBP. Interestingly, some urban environment factors such as the accesibility to public transport were evidenced as protective factors for high blood pressure. Among all exposures remaining in the final model as predictors of DBP, copper blood levels were one of the (non-confounder) exposures showing a higher effect size (B=0.05). Interestingly, many of these findings, especially those related to copper exposure or the effect of the urban environment on blood pressure levels were already described in our previous publications (Warembourg et al. 2019) and (Warembourg et al. 2021).

3.3 Optimal models: Systolic Blood Pressure.

The optimal lambda was 0.05 for the prenatal and 0.02 for the postnatal period. Predictors 1 to 13 in the dataset were defined as the adjusting covariates.

# Solver of Lasso Regression
OptimalModel_SBP_PRE <- ds.LassoCov_Train(X=X_SBP_PRE, Y=Y_SBP,type = "regress", lambda=cvResult_SBP$lambda.min, covar=c(1:13),  opts=opts, datasources=conns, nDigits=15)
names(OptimalModel_SBP_PRE$w) <- ds.colnames("X_SBP_PRE")[[1]]
OptimalModel_SBP_POS <- ds.LassoCov_Train(X=X_SBP_POS, Y=Y_SBP,type = "regress", lambda=cvResult_SBP$lambda.min, covar=c(1:13),  opts=opts, datasources=conns, nDigits=15)
names(OptimalModel_SBP_POS$w) <- ds.colnames("X_SBP_POS")[[1]]

Below, the number and additional details for selected predictors in each model are shown.

# PRENATAL
# Get the number of selected variables
sum(OptimalModel_SBP_PRE$ws!=0)

[1] 14

# Show estimated coefficients and additional metadata for selected variables
toplot <- data.frame(OptimalModel_SBP_PRE$ws,ds.colnames("X_SBP_PRE")[[1]])
colnames(toplot) <- c("Coefficient","Index")
toplot_ <- codebook_vars[toplot[,2],c("Period","Group")]
toplot_save <- cbind(toplot,toplot_)
toplot_save <- toplot_save[which(toplot_save$Coefficient!=0),c(2,1,3,4)]
rownames(toplot_save) <- NULL
knitr::kable(toplot_save[order(abs(toplot_save$Coefficient),decreasing = T),], caption = "SBP Prenatal Optimal model") %>% row_spec(0,bold=TRUE) %>% 
kable_styling()

SBP Prenatal Optimal model
	Index	Coefficient	Period	Group
1	h_cohort_BIB	0.5046683	Pregnancy	Key Covariates
13	hs_c_height_None	0.3476911	Postnatal	Key Covariates
2	h_cohort_KANC	-0.3461856	Pregnancy	Key Covariates
5	h_cohort_INMA	0.2611074	Pregnancy	Key Covariates
3	h_cohort_MOBA	-0.2104300	Pregnancy	Key Covariates
4	h_cohort_RHEA	-0.1598021	Pregnancy	Key Covariates
6	h_age_None	0.0788615	Pregnancy	Key Covariates
12	hs_child_age_days_None	0.0784751	Postnatal	Key Covariates
8	h_mbmi_None	0.0772208	Pregnancy	Key Covariates
10	h_native_None	-0.0583598	Pregnancy	Key Covariates
7	h_edumc_None	-0.0477134	Pregnancy	Key Covariates
11	e3_sex_None	0.0385127	Pregnancy	Key Covariates
14	h_fdensity300_preg_Log	-0.0198690	Pregnancy	Built Environment
9	h_parity_None	-0.0110181	Pregnancy	Key Covariates

toplot$vargroup <- rep(NA,nrow(toplot))
toplot$vargroup[which(toplot$Coefficient > 0)] <- "Positive"
toplot$vargroup[which(toplot$Coefficient == 0)] <- "Null"
toplot$vargroup[which(toplot$Coefficient < 0)] <- "Negative"

G7 <- ggplot(toplot, aes(x=Index, y=Coefficient, group = vargroup, label=Index)) +
  geom_point(size=2,aes(color = vargroup)) +
  scale_color_manual(values=c("#CD534CFF","black","#227CAD")) +
  geom_hline(yintercept=0, linetype='dashed', col = "#868686FF") +
  geom_label_repel(data         = subset(toplot, toplot$vargroup!="Null" ),
                   size          = 2,
                   box.padding   = 0.5,
                   point.padding = 0.5,
                   force         = 100,
                   segment.size  = 0.2,
                   segment.color = "grey50", max.overlaps = 8
                   ) +
  theme(text = element_text(size=6), axis.text.x = element_text(angle = 90),legend.position = "none") + ggtitle("SBP Prenatal Optimal model") + xlab("Exposures") + ylab("Coefficients")

# POSTNATAL
# Get the number of selected variables
sum(OptimalModel_SBP_POS$ws!=0)

[1] 39

# Show estimated coefficients and additional metadata for selected variables
toplot <- data.frame(OptimalModel_SBP_POS$ws,ds.colnames("X_SBP_POS")[[1]])
colnames(toplot) <- c("Coefficient","Index")
toplot_ <- codebook_vars[toplot[,2],c("Period","Group")]
toplot_save <- cbind(toplot,toplot_)
toplot_save <- toplot_save[which(toplot_save$Coefficient!=0),c(2,1,3,4)]
rownames(toplot_save) <- NULL
knitr::kable(toplot_save[order(abs(toplot_save$Coefficient),decreasing = T),], caption = "SBP Postnatal Optimal model") %>% row_spec(0,bold=TRUE) %>% 
kable_styling()

SBP Postnatal Optimal model
	Index	Coefficient	Period	Group
1	h_cohort_BIB	0.4363117	Pregnancy	Key Covariates
13	hs_c_height_None	0.3183230	Postnatal	Key Covariates
2	h_cohort_KANC	-0.2730910	Pregnancy	Key Covariates
5	h_cohort_INMA	0.2488328	Pregnancy	Key Covariates
3	h_cohort_MOBA	-0.2245687	Pregnancy	Key Covariates
4	h_cohort_RHEA	-0.1375123	Pregnancy	Key Covariates
6	h_age_None	0.0887953	Pregnancy	Key Covariates
21	hs_dde_cadj_Log2	-0.0750493	Postnatal	OCs
10	h_native_None	-0.0669917	Pregnancy	Key Covariates
8	h_mbmi_None	0.0607844	Pregnancy	Key Covariates
12	hs_child_age_days_None	0.0564032	Postnatal	Key Covariates
31	hs_mg_c_Log2	0.0531831	Postnatal	Essential minerals
11	e3_sex_None	0.0486127	Pregnancy	Key Covariates
27	hs_hcb_cadj_Log2	-0.0481409	Postnatal	OCs
39	PSS_4_Score_None	0.0380141	Postnatal	Others
28	hs_mbzp_cadj_Log2	-0.0281945	Postnatal	Phthalates
17	hs_as_c_Log2	-0.0265266	Postnatal	Metals
33	hs_pbde153_cadj_Log2	-0.0260163	Postnatal	PBDEs
37	hs_pfunda_c_Log2	0.0239332	Postnatal	PFASs
7	h_edumc_None	-0.0232285	Pregnancy	Key Covariates
24	hs_dmtp_cadj_Log2	-0.0217748	Postnatal	OP Pesticides
32	hs_na_c_Log2	-0.0208633	Postnatal	Essential minerals
14	h_Absorbance_Log	0.0197715	Postnatal	Indoor air
29	hs_mehp_cadj_Log2	-0.0187647	Postnatal	Phthalates
36	hs_pfoa_c_Log2	0.0108690	Postnatal	PFASs
20	hs_cu_c_Log2	0.0105436	Postnatal	Metals
34	hs_pbde47_cadj_Log2	0.0082825	Postnatal	PBDEs
9	h_parity_None	-0.0080572	Pregnancy	Key Covariates
30	hs_mep_cadj_Log2	-0.0080463	Postnatal	Phthalates
16	hs_accesspoints300_h_Log	-0.0071812	Postnatal	Built Environment
38	hs_pm25abs_yr_hs_h_Log	0.0068807	Postnatal	Air Pollution
15	h_Benzene_Log	0.0058062	Postnatal	Indoor air
18	hs_bupa_cadj_Log2	-0.0049075	Postnatal	Phenols
23	hs_dmp_cadj_Log2	-0.0034749	Postnatal	OP Pesticides
26	hs_frichness300_h_None	-0.0032700	Postnatal	Built Environment
35	hs_pfna_c_Log2	0.0022736	Postnatal	PFASs
25	hs_etpa_cadj_Log2	0.0019301	Postnatal	Phenols
22	hs_dep_cadj_Log2	-0.0019300	Postnatal	OP Pesticides
19	hs_cd_c_Log2	0.0000674	Postnatal	Metals

toplot$vargroup <- rep(NA,nrow(toplot))
toplot$vargroup[which(toplot$Coefficient > 0)] <- "Positive"
toplot$vargroup[which(toplot$Coefficient == 0)] <- "Null"
toplot$vargroup[which(toplot$Coefficient < 0)] <- "Negative"

G8 <- ggplot(toplot, aes(x=Index, y=Coefficient, group = vargroup, label=Index)) +
  geom_point(size=2,aes(color = vargroup)) +
  scale_color_manual(values=c("#CD534CFF","black","#227CAD")) +
  geom_hline(yintercept=0, linetype='dashed', col = "#868686FF") +
  geom_label_repel(data         = subset(toplot, toplot$Coefficient > quantile(abs(toplot$Coefficient[toplot$Coefficient>0]),0.15) | toplot$Coefficient < -quantile(abs(toplot$Coefficient[toplot$Coefficient<0]) ,0.45)),
                   size          = 2,
                   box.padding   = 0.5,
                   point.padding = 0.5,
                   force         = 100,
                   segment.size  = 0.2,
                   segment.color = "grey50", max.overlaps = 8
                   ) +
  theme(text = element_text(size=6), axis.text.x = element_text(angle = 90),legend.position = "none") + ggtitle("SBP Postnatal Optimal model") + xlab("Exposures") + ylab("Coefficients")

ggarrange(G7, G8, ncol = 2, labels=c("A","B"))

The number of predictors remaining in the final models was 14 (13 confounders + 1 exposure) for the prenatal period, and 39 (13 confounders + 26 exposures) for the postnatal period. Interestingly, a higher number of facility types near children’s residences in both pregnancy and postnatal periods was associated with lower SBP values (this was already observed by (Warembourg et al. 2021)). For the postnatal period, results were pretty concordant with those extracted for DBP, remaining the most interesting associations, such as those revealed for high-stress environment or copper blood levels.

In both cases (DBP and SBP), unadjusted models revealed similar findings to adjusted models in terms of selected variables and direction of associations. Nevertheless, in comparison to dsLassoCov, the uncorrected dsLasso approach yielded less sparse (and therefore less interpretable) models as well as selected some unexpected variables with extremely high effect sizes and underestimated the effect of some other exposures, such as those identified as key confounders. For example, in the case of the postnatal SBP unadjusted model, the blood levels of hexachlorobenzene were selected as one of the exposures with the highest effect sizes, even above key known-exposures such as the cohort of recruitment or the maternal BMI. Likewise, unadjusted models evidenced unexpected associations such as an inverse proportional relationship between the total traffic load of all roads in 100 m buffer at home and SBP. This is an unexpected finding since previous studies have shown how noise and air pollution in cities is a well-known trigger of high blood pressure and cardiovascular disease.

# Get the number of selected variables
sum(OptimalModel_SBP_POS_nocovs$w!=0)

[1] 49

# Show estimated coefficients and additional metadata for selected variables
toplot <- data.frame(OptimalModel_SBP_POS_nocovs$w,ds.colnames("X_SBP_POS")[[1]])
colnames(toplot) <- c("Coefficient","Index")
toplot_ <- codebook_vars[toplot[,2],c("Period","Group")]
toplot_save <- cbind(toplot,toplot_)
toplot_save <- toplot_save[which(toplot_save$Coefficient!=0),c(2,1,3,4)]
rownames(toplot_save) <- NULL
knitr::kable(toplot_save[order(abs(toplot_save$Coefficient),decreasing=T),], caption = "SBP Postnatal Unadjusted Optimal model") %>% row_spec(0,bold=TRUE) %>% 
kable_styling()

SBP Postnatal Unadjusted Optimal model
	Index	Coefficient	Period	Group
9	hs_c_height_None	0.2287006	Postnatal	Key Covariates
25	hs_hcb_cadj_Log2	-0.1448063	Postnatal	OCs
19	hs_dde_cadj_Log2	-0.0938842	Postnatal	OCs
8	hs_child_age_days_None	0.0751858	Postnatal	Key Covariates
6	h_native_None	-0.0684704	Pregnancy	Key Covariates
32	hs_mg_c_Log2	0.0583768	Postnatal	Essential minerals
3	h_age_None	0.0561176	Pregnancy	Key Covariates
49	PSS_4_Score_None	0.0538807	Postnatal	Others
5	h_mbmi_None	0.0518649	Pregnancy	Key Covariates
28	hs_mbzp_cadj_Log2	-0.0473453	Postnatal	Phthalates
37	hs_no2_yr_hs_h_Log	0.0452616	Postnatal	Air Pollution
45	hs_pm25abs_yr_hs_h_Log	0.0404114	Postnatal	Air Pollution
42	hs_pfoa_c_Log2	0.0397292	Postnatal	PFASs
11	h_NO2_Log	0.0369238	Postnatal	Indoor air
7	e3_sex_None	0.0339399	Pregnancy	Key Covariates
13	hs_accesspoints300_h_Log	0.0242645	Postnatal	Built Environment
47	hs_trafload_h_pow1over3	-0.0226582	Postnatal	Traffic
10	h_Absorbance_Log	0.0197312	Postnatal	Indoor air
40	hs_pbde47_cadj_Log2	0.0185071	Postnatal	PBDEs
39	hs_pbde153_cadj_Log2	-0.0176491	Postnatal	PBDEs
18	hs_cu_c_Log2	0.0174570	Postnatal	Metals
20	hs_dmp_cadj_Log2	-0.0173723	Postnatal	OP Pesticides
4	h_edumc_None	-0.0156090	Pregnancy	Key Covariates
34	hs_mnbp_cadj_Log2	-0.0151162	Postnatal	Phthalates
30	hs_mehp_cadj_Log2	-0.0145265	Postnatal	Phthalates
26	hs_hg_c_Log2	0.0142944	Postnatal	Metals
15	hs_bupa_cadj_Log2	-0.0138145	Postnatal	Phenols
24	hs_frichness300_h_None	-0.0137729	Postnatal	Built Environment
21	hs_dmtp_cadj_Log2	-0.0135055	Postnatal	OP Pesticides
27	hs_landuseshan300_h_None	0.0108409	Postnatal	Built Environment
22	hs_etpa_cadj_Log2	0.0100900	Postnatal	Phenols
31	hs_mepa_cadj_Log2	0.0099626	Postnatal	Phenols
14	hs_as_c_Log2	-0.0098313	Postnatal	Metals
46	hs_se_c_Log2	0.0088805	Postnatal	Essential minerals
36	hs_ndvi100_h_None	-0.0082579	Postnatal	Natural Spaces
43	hs_pfunda_c_Log2	0.0063179	Postnatal	PFASs
35	hs_na_c_Log2	-0.0042086	Postnatal	Essential minerals
33	hs_mibp_cadj_Log2	0.0040371	Postnatal	Phthalates
41	hs_pfhxs_c_Log2	0.0028548	Postnatal	PFASs
2	h_cohort_MOBA	-0.0025974	Pregnancy	Key Covariates
1	h_cohort_BIB	0.0025684	Pregnancy	Key Covariates
38	hs_ohminp_cadj_Log2	0.0011654	Postnatal	Phthalates
17	hs_cs_c_Log2	0.0007918	Postnatal	Metals
44	hs_pm25_yr_hs_h_None	0.0007645	Postnatal	Air Pollution
48	hs_uvdvf_mt_hs_h_None	0.0006950	Postnatal	Meteorological
12	h_TEX_Log	-0.0006882	Postnatal	Indoor air
23	hs_fdensity300_h_Log	-0.0003913	Postnatal	Built Environment
16	hs_cd_c_Log2	0.0000234	Postnatal	Metals
29	hs_mecpp_cadj_Log2	-0.0000119	Postnatal	Phthalates

4 Conclusion.

With this showcase, we thereby demonstrate that our method dsLassoCov and the DataSHIELD infrastructure are powerful tools for performing feature selection in complex scenarios such as the case of exposome research datasets, in which confounding effects can be strong biasing research findings. The fact of two papers already published with a similar approach showing coincident results ((Warembourg et al. 2019) and (Warembourg et al. 2021)) also add robustness to our statements. Specially in the case of sensitive multi-centre biomedical research studies, the use of federated-analysis platforms such as DataSHIELD should be promoted.

datashield.logout(conns)

5 Session Info.

sessionInfo()

R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=es_ES.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=es_ES.UTF-8        LC_COLLATE=es_ES.UTF-8    
 [5] LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=es_ES.UTF-8   
 [7] LC_PAPER=es_ES.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] ggpubr_0.4.0           lattice_0.20-45        gridExtra_2.3         
 [4] kableExtra_1.3.4       RColorBrewer_1.1-3     reshape2_1.4.4        
 [7] forcats_0.5.2          stringr_1.4.0          purrr_0.3.4           
[10] readr_2.1.2            tidyr_1.2.1            tibble_3.1.5          
[13] tidyverse_1.3.1        ggrepel_0.9.1          ggplot2_3.3.6         
[16] dplyr_1.0.10           dsHelper_0.4.18.9000   dsMTLClient_0.9       
[19] corpcor_1.6.9          dsExposomeClient_2.0.4 dsBaseClient_6.1.1    
[22] DSOpal_1.3.1           DSI_1.4.0              R6_2.5.1              
[25] progress_1.2.2         opalr_3.1.1            httr_1.4.4            

loaded via a namespace (and not attached):
 [1] nlme_3.1-159      fs_1.5.2          lubridate_1.8.0   webshot_0.5.3    
 [5] tools_4.2.0       backports_1.2.1   bslib_0.4.0       utf8_1.2.2       
 [9] metafor_3.8-1     DBI_1.1.2         colorspace_2.0-3  withr_2.5.0      
[13] tidyselect_1.1.2  prettyunits_1.1.1 curl_4.3.2        compiler_4.2.0   
[17] cli_3.4.0         rvest_1.0.2       xml2_1.3.3        labeling_0.4.2   
[21] sass_0.4.2        scales_1.2.1      systemfonts_1.0.4 digest_0.6.28    
[25] rmarkdown_2.16    svglite_2.1.0     pkgconfig_2.0.3   htmltools_0.5.3  
[29] labelled_2.9.1    dbplyr_2.2.1      fastmap_1.1.0     highr_0.9        
[33] rlang_1.0.5       readxl_1.4.1      rstudioapi_0.14   farver_2.1.1     
[37] jquerylib_0.1.4   generics_0.1.2    jsonlite_1.7.2    car_3.1-0        
[41] magrittr_2.0.1    metadat_1.2-0     Matrix_1.4-1      Rcpp_1.0.9       
[45] munsell_0.5.0     fansi_0.5.0       abind_1.4-5       lifecycle_1.0.1  
[49] stringi_1.7.5     yaml_2.3.5        carData_3.0-4     mathjaxr_1.6-0   
[53] plyr_1.8.7        crayon_1.4.1      cowplot_1.1.1     haven_2.5.1      
[57] hms_1.1.2         knitr_1.40        pillar_1.8.1      ggsignif_0.6.3   
[61] codetools_0.2-18  reprex_2.0.2      glue_1.6.2        evaluate_0.16    
[65] modelr_0.1.9      vctrs_0.4.1       tzdb_0.3.0        cellranger_1.1.0 
[69] gtable_0.3.0      assertthat_0.2.1  cachem_1.0.6      xfun_0.32        
[73] mime_0.12         broom_1.0.1       rstatix_0.7.0     viridisLite_0.4.1
[77] ellipsis_0.3.2

References.

Warembourg, Charline, Léa Maitre, Ibon Tamayo-Uria, Serena Fossati, Theano Roumeliotaki, Gunn Marit Aasvang, Sandra Andrusaityte, et al. 2019. “Early-Life Environmental Exposures and Blood Pressure in Children.” Journal of the American College of Cardiology 74 (10): 1317–28. https://doi.org/https://doi.org/10.1016/j.jacc.2019.06.069.

Warembourg, Charline, Mark Nieuwenhuijsen, Ferran Ballester, Montserrat de Castro, Leda Chatzi, Ana Esplugues, Barbara Heude, et al. 2021. “Urban Environment During Early-Life and Blood Pressure in Young Children.” Environ. Int. 146 (106174): 106174.

Federated Lasso for feature-selection with the control of covariates in DataSHIELD (dsLassoCov)

Augusto Anguita-Ruiz^,1,✉, Han Cao^,2, Charline Warembourg³, Xavier Escriba-Montagut¹, Martine Vrijheid¹, Juan Ramón González¹, Emanuel Schwarz², and Xavier Basagaña¹

21 febrero 2023

1 Introduction.

2 Methods.

2.1 HELIX project.

2.2 Experimental Set-Up and research hypothesis.

2.2.1 Data organization in the Opal server.

2.3 Data loading and preparation in DataSHIELD.

3 Results.

3.1 Lasso regression on Blood pressure: estimation of optimal hyperparameters.

3.2 Optimal models: Diastolic Blood Pressure.

3.3 Optimal models: Systolic Blood Pressure.

4 Conclusion.

5 Session Info.

References.

Federated Lasso for feature-selection with the control of covariates in DataSHIELD (dsLassoCov)

Augusto Anguita-Ruiz*,1,✉, Han Cao*,2, Charline Warembourg3, Xavier Escriba-Montagut1, Martine Vrijheid1, Juan Ramón González1, Emanuel Schwarz2, and Xavier Basagaña1

21 febrero 2023

1 Introduction.

2 Methods.

2.1 HELIX project.

2.2 Experimental Set-Up and research hypothesis.

2.2.1 Data organization in the Opal server.

2.3 Data loading and preparation in DataSHIELD.

3 Results.

3.1 Lasso regression on Blood pressure: estimation of optimal hyperparameters.

3.2 Optimal models: Diastolic Blood Pressure.

3.3 Optimal models: Systolic Blood Pressure.

4 Conclusion.

5 Session Info.

References.

Augusto Anguita-Ruiz^,1,✉, Han Cao^,2, Charline Warembourg³, Xavier Escriba-Montagut¹, Martine Vrijheid¹, Juan Ramón González¹, Emanuel Schwarz², and Xavier Basagaña¹