* These authors contributed equally to this work.

1 Barcelona Institute for Global Health.
2 Department of Psychiatry and Psychotherapy, Heidelberg University.
3 Univ Rennes, Inserm, EHESP, Irset.

Correspondence: Augusto Anguita-Ruiz <>

1 Introduction.

The current report serves as an online companion for the manuscript “To be supplied [”Title”. Author X et al. 2022.]”. With this analysis, we demonstrate the suitability of our method “dsLassoCov” to perform feature-selection in a real scenario with complex exposome data and under the privacy-protected federated DataSHIELD system.

2 Methods.

In this section, we will describe the data employed for the showcase, giving details about the experimental design, the type of features available, and the structure and organization of the data in the Opal infrastructure.

2.1 HELIX project.

The research dataset employed here derives from the HELIX (Human Early-Life Exposome) Project. The HELIX project gathers data from 6 longitudinal-based European birth cohorts with the aim of evaluating the effect of environmental risk factors on mothers’s and children’s health. HELIX cohorts include the BIB (Born in Bradford) (United Kingdom), EDEN (Étude des Déterminants pré et postnatals du développement et de la santé de l’ENfant) (France), INMA (INfancia y Medio Ambiente) (Spain), KANC (Kaunus Cohort) (Lithuania), MoBa (Norwegian Mother and Child Cohort Study) (Norway), and Rhea (Mother-Child Cohort in Crete) (Greece). General details of the study design can be found in the Figure 1. The whole HELIX dataset includes a total of 31,472 mother-child pairs. Among them, a subcohort of 1,298 children (approximately 200 children in each cohort) was selected in this study according to the following criteria of eligibility: 1) age 6 to 11 years at the moment of outcome evaluation; 2) complete address history; and 3) no serious health problems that may affect the clinical testing or the child safety.


Figure 1. General overview of the HELIX research project.

Figure 1. General overview of the HELIX research project.


In the 1,301 children, a wide range of environmental exposures were evaluated to define the early-life exposome during two time periods: the prenatal pregnancy period and postnatal period (childhood age 6 to 11 years). Collected exposures comprise the three main parts of the exposome: outdoor exposures, chemical exposures, and lifestyle factors. All variables incorporated in the dataset have been appropriately pre-processed previous to analysis (normalized and scaled, outliers removed, and missing values imputated). Regarding phenotype data, a wide range of health outcomes are also available for the postnatal (childhood) period including outcomes related to (1) obesity and cardiometabolic health, (2) respiratory health, and (3) cognition and mental health.

HELIX Exposome data included information for 208 exposures and 88 phenotypes (Table 1).

Table 1. General description for the HELIX exposome dataset.
Cohort N Individuals N Predictors N Phenotypes
BIB_cohort 202 208 88
EDEN_cohort 198 208 88
INMA-SAB_cohort 223 208 88
KANC_cohort 204 208 88
MoBA_cohort 272 208 88
Rhea_cohort 199 208 88
Combined Studies 1298 208 88

2.2 Experimental Set-Up and research hypothesis.

The outcome of interest in the present study was the blood pressure at the postnatal period (childhood). Specifically, systolic and diastolic blood pressure values (SBP and DBP). These outcomes have been previously investigated in this population and the main findings of the research can be found in (Warembourg et al. 2019) and (Warembourg et al. 2021). In this showcase, available exposure data comprised 115 postnatal and 93 prenatal variables, respectively. Among them, we focused only in continuous exposures (77 postnatal and 69 prenatal, respectively). A separate model was run for each outcome (SBP and DBP) and each time point (prenatal and postnatal).

Nine confounders were identified from a Directed Acyclic Graph (Figure 2), including cohort of recruitment, maternal age (continuous in years), maternal educational level (low, middle, or high), self-reported maternal pre-pregnancy body mass index (continuous in kg/m2), parity (nulliparous, primiparous, or multiparous), native (if the child family is native from country of recruitment), and child age, height and sex. Cohort of recruitment was treated as six dummy variables (one for each population), and only five of them were finally included as confounders.

Figure 2. Direct Acyclic Graph design to identified confounder in the association between the early-life exposome and child blood pressure.

Figure 2. Direct Acyclic Graph design to identified confounder in the association between the early-life exposome and child blood pressure.

2.2.1 Data organization in the Opal server.

The dataset was uploaded to the Opal BRGE site hosted by the Bioinformatic Research Group in Epidemiology of ISGlobal, simulating a single-site DataSHIELD infrastructure (Figure 3). Details for accessing the server can be found in the respective DataSHIELD analysis section. Data were stored under the form of .csv (one per cohort, outcome and time point), and can be easily loaded as tables into the DataSHIELD environment. A summary of the structure and organization of the data in the Opal for the HELIX project is illustrated in Figure 3.


Figure 3. Input HELIX dataset organized by cohort in the Opal server.

Figure 3. Input HELIX dataset organized by cohort in the Opal server.


2.3 Data loading and preparation in DataSHIELD.

In this section, we show how to load the data into the DataSHIELD environment. We start by creating the connection to the opal server using an user, who have DataSHIELD permissions to Opal servers. Please, note that in our example, all datasets are hosted in the same Opal but each cohort sub-dataset is accessed separately.

# Loading DataSHIELD required packages
require(DSOpal)
require(DSI)
require(dsBaseClient)
require(dsExposomeClient)
require(dsMTLClient)
require(dsHelper)

# Loading additional required packages
require(dplyr)
require(ggplot2)
require(ggrepel)
require(tidyverse)
require(ggrepel)
require(reshape2)
require(RColorBrewer)
require(kableExtra)
require(grid)
require(gridExtra)
require(lattice)
require(ggpubr)

# Create connections
builder <- DSI::newDSLoginBuilder()
builder$append(server = "BIB", url = "https://opal.isglobal.org/repo",
               user =  "invited", password = "12345678Aa@",
               profile = "rock-lasso")
builder$append(server = "EDEN", url = "https://opal.isglobal.org/repo",
               user =  "invited", password = "12345678Aa@",
               profile = "rock-lasso")
builder$append(server = "KANC", url = "https://opal.isglobal.org/repo",
                user =  "invited", password = "12345678Aa@",
                profile = "rock-lasso")
builder$append(server = "MoBA", url = "https://opal.isglobal.org/repo",
               user =  "invited", password = "12345678Aa@",
               profile = "rock-lasso")
builder$append(server = "Rhea", url = "https://opal.isglobal.org/repo",
               user =  "invited", password = "12345678Aa@",
               profile = "rock-lasso")
builder$append(server = "INMASAB", url = "https://opal.isglobal.org/repo",
               user =  "invited", password = "12345678Aa@",
               profile = "rock-lasso")
logindata <- builder$build()
conns <- DSI::datashield.login(logins = logindata)

Then, we can load all the .csv tables available in the Opal server, corresponding to each cohort/phenotype, using the DSI::datashield.assign.table() function. This function takes the connections to the server created in the previous code chunk to assign all available objects from the the Opal, to an R object in the DataSHIELD remote session.

# We assign the resources for the DBP outcome
DSI::datashield.assign.table(conns[1], "dataDBP_PRE", "HELIX.data1DBP_pre", async = FALSE)
DSI::datashield.assign.table(conns[2], "dataDBP_PRE", "HELIX.data2DBP_pre", async = FALSE)
DSI::datashield.assign.table(conns[3], "dataDBP_PRE", "HELIX.data3DBP_pre", async = FALSE)
DSI::datashield.assign.table(conns[4], "dataDBP_PRE", "HELIX.data4DBP_pre", async = FALSE)
DSI::datashield.assign.table(conns[5], "dataDBP_PRE", "HELIX.data5DBP_pre", async = FALSE)
DSI::datashield.assign.table(conns[6], "dataDBP_PRE", "HELIX.data6DBP_pre", async = FALSE)

DSI::datashield.assign.table(conns[1], "dataDBP_POS", "HELIX.data1DBP_POS", async = FALSE)
DSI::datashield.assign.table(conns[2], "dataDBP_POS", "HELIX.data2DBP_POS", async = FALSE)
DSI::datashield.assign.table(conns[3], "dataDBP_POS", "HELIX.data3DBP_POS", async = FALSE)
DSI::datashield.assign.table(conns[4], "dataDBP_POS", "HELIX.data4DBP_POS", async = FALSE)
DSI::datashield.assign.table(conns[5], "dataDBP_POS", "HELIX.data5DBP_POS", async = FALSE)
DSI::datashield.assign.table(conns[6], "dataDBP_POS", "HELIX.data6DBP_POS", async = FALSE)


# We assign the resources for the SBP outcome
DSI::datashield.assign.table(conns[1], "dataSBP_PRE", "HELIX.data1SBP_PRE", async = FALSE)
DSI::datashield.assign.table(conns[2], "dataSBP_PRE", "HELIX.data2SBP_PRE", async = FALSE)
DSI::datashield.assign.table(conns[3], "dataSBP_PRE", "HELIX.data3SBP_PRE", async = FALSE)
DSI::datashield.assign.table(conns[4], "dataSBP_PRE", "HELIX.data4SBP_PRE", async = FALSE)
DSI::datashield.assign.table(conns[5], "dataSBP_PRE", "HELIX.data5SBP_PRE", async = FALSE)
DSI::datashield.assign.table(conns[6], "dataSBP_PRE", "HELIX.data6SBP_PRE", async = FALSE)

DSI::datashield.assign.table(conns[1], "dataSBP_POS", "HELIX.data1SBP_POS", async = FALSE)
DSI::datashield.assign.table(conns[2], "dataSBP_POS", "HELIX.data2SBP_POS", async = FALSE)
DSI::datashield.assign.table(conns[3], "dataSBP_POS", "HELIX.data3SBP_POS", async = FALSE)
DSI::datashield.assign.table(conns[4], "dataSBP_POS", "HELIX.data4SBP_POS", async = FALSE)
DSI::datashield.assign.table(conns[5], "dataSBP_POS", "HELIX.data5SBP_POS", async = FALSE)
DSI::datashield.assign.table(conns[6], "dataSBP_POS", "HELIX.data6SBP_POS", async = FALSE)

Since dsLassoCov functions require data of each cohort to be passed as separate matrices for predictors and the outcome, we still needed to do some additional data preparation (i.e., predictor data were assigned to a matrix called “X”, and the outcome to an object called “Y”).

# Checks on the DBP dataset
ds.class("dataDBP_POS")
ds.dim("dataDBP_POS")

# Checks on the SBP dataset
ds.class("dataSBP_POS")
ds.dim("dataSBP_POS")

# Assign outcome data to a separate object called Y
ds.assign(toAssign='dataDBP_POS$Y', newobj='Y_DBP', datasources = conns)
ds.assign(toAssign='dataSBP_POS$Y', newobj='Y_SBP', datasources = conns)

# Create a vector with all ones
ds.make(toAssign = "Y_DBP-Y_DBP+1",newobj = "ONES",datasources = conns)

# Select predictors and assign them to a separate object called X (we will exclude here one of the cohorts dummy variables (Cohort EDEN) to avoid perfect collinearity between predictors)
ds.dataFrameSubset(df.name = 'dataDBP_PRE',  V1.name = "ONES",  V2.name = "ONES",  Boolean.operator = "==",keep.cols = c(1,3:84),
                   newobj = 'X_DBP_PRE',  datasources = conns)
ds.dataFrameSubset(df.name = 'dataDBP_POS',  V1.name = "ONES",  V2.name = "ONES",  Boolean.operator = "==",keep.cols = c(1,3:92),
                   newobj = 'X_DBP_POS',  datasources = conns)
ds.dataFrameSubset(df.name = 'dataSBP_PRE',  V1.name = "ONES",  V2.name = "ONES",  Boolean.operator = "==",keep.cols = c(1,3:84),
                   newobj = 'X_SBP_PRE',  datasources = conns)
ds.dataFrameSubset(df.name = 'dataSBP_POS',  V1.name = "ONES",  V2.name = "ONES",  Boolean.operator = "==",keep.cols = c(1,3:92),
                   newobj = 'X_SBP_POS',  datasources = conns)

# Checks on created files
ds.dim('X_DBP_PRE')
ds.dim('X_DBP_POS')
ds.dim('X_SBP_PRE')
ds.dim('X_SBP_POS')

ds.length('Y_DBP')
ds.length('Y_SBP')

# See covariates names:
ds.colnames('X_DBP_PRE')[[1]][1:13]
ds.colnames('X_SBP_POS')[[1]][1:13]

# We coerce both Xs and Y objects into matrix-type objects
ds.asMatrix(x.name = 'X_DBP_PRE', newobj = 'X_DBP_PRE', datasources = conns)
ds.asMatrix(x.name = 'X_DBP_POS', newobj = 'X_DBP_POS', datasources = conns)
ds.asMatrix(x.name = 'X_SBP_PRE', newobj = 'X_SBP_PRE', datasources = conns)
ds.asMatrix(x.name = 'X_SBP_POS', newobj = 'X_SBP_POS', datasources = conns)

ds.asMatrix(x.name = 'Y_DBP', newobj = 'Y_DBP', datasources = conns)
ds.asMatrix(x.name = 'Y_SBP', newobj = 'Y_SBP', datasources = conns)

# We assign objects from the remote session to R objects 
# in the client-side
X_DBP="X_DBP_PRE"; X_DBP="X_DBP_POS"; Y_DBP="Y_DBP"
X_SBP="X_SBP_PRE"; X_SBP="X_SBP_POS"; Y_SBP="Y_SBP"

3 Results.

3.1 Lasso regression on Blood pressure: estimation of optimal hyperparameters.

The first step of the analysis involved performing a 5-folds cross-validation procedure over a sequence of 50 lambda values (estimated from the data), for the identification of the optimal lambda hyperparameter, which was selected according to MSE criterion.

# The "opts" parameter allows controlling the optimization algorithm employed to minimize the sum of squared errors (SSE) (objective function) during the coefficients estimation
opts=list(); opts$init=0; opts$maxIter=100; opts$tol=0.001; opts$ter=2;
# Identification of the optimal lambda value by k-fold cross-validation for each outcome
cvResult_DBP_PRE <- ds.LassoCov_CVInSite(X=X_DBP_PRE, Y=Y_DBP, type="regress", nlambda=50, lam_ratio=0.01, 
                              opts=opts, covar=c(1:13), datasources=conns, nDigits=4, 
                              nfolds=5)
cvResult_DBP_POS <- ds.LassoCov_CVInSite(X=X_DBP_POS, Y=Y_DBP, type="regress", nlambda=50, lam_ratio=0.01, 
                              opts=opts, covar=c(1:13), datasources=conns, nDigits=4, 
                              nfolds=5)
cvResult_SBP_PRE <- ds.LassoCov_CVInSite(X=X_SBP_PRE, Y=Y_SBP, type="regress", nlambda=50, lam_ratio=0.01, 
                              opts=opts, covar=c(1:13), datasources=conns, nDigits=4, 
                              nfolds=5)
cvResult_SBP_POS <- ds.LassoCov_CVInSite(X=X_SBP_POS, Y=Y_SBP, type="regress", nlambda=50, lam_ratio=0.01, 
                              opts=opts, covar=c(1:13), datasources=conns, nDigits=4, 
                              nfolds=5)
cvResult_DBP_PRE$lambda.min
[1] 0.1263654
cvResult_DBP_POS$lambda.min
[1] 0.01969534
cvResult_SBP_PRE$lambda.min
[1] 0.04561056
cvResult_SBP_POS$lambda.min
[1] 0.02002656
  1. DBP: optimal lambda was 0.13 for prenatal and 0.02 for postnatal.
# PRENATAL
# Prepare model results for plot
dataset <- na.omit(reshape2::melt(cvResult_DBP_PRE$mse_fold))
LambdaVal <- round(apply(cvResult_DBP_PRE$lam_seq,2,mean),2)
LambdaVal <- paste(unique(dataset$Var2),"=",LambdaVal)

# Set range of colors for lambdas
fun_color_range <- colorRampPalette(c("#08737f", "#f95559"))   # Apply colorRampPalette
my_colors <- fun_color_range(50)  

# Boxplot showing the averaged MSE obtained for lambda values over folds
G1 <- ggplot(dataset,aes(x=Var1,y=value,fill=Var2))+geom_boxplot()+ labs(x = "Averaged lambda over folds",
y = "Mean squared error", fill='Lambda value') + scale_fill_manual(values=my_colors,labels = LambdaVal) + theme(legend.position = "none")+ ggtitle("DBP Prenatal")


# POSTNATAL
# Prepare model results for plot
dataset <- na.omit(reshape2::melt(cvResult_DBP_POS$mse_fold))
LambdaVal <- round(apply(cvResult_DBP_POS$lam_seq,2,mean),2)
LambdaVal <- paste(unique(dataset$Var2),"=",LambdaVal)

# Set range of colors for lambdas
fun_color_range <- colorRampPalette(c("#08737f", "#f95559"))   # Apply colorRampPalette
my_colors <- fun_color_range(50)  

# Boxplot showing the averaged MSE obtained for lambda values over folds
G2 <- ggplot(dataset,aes(x=Var1,y=value,fill=Var2))+geom_boxplot()+ labs(x = "Averaged lambda over folds",
y = "Mean squared error", fill='Lambda value') + scale_fill_manual(values=my_colors,labels = LambdaVal) + theme(legend.position = "none")+ ggtitle("DBP Postnatal")
  1. SBP: optimal lambda was 0.05 for prenatal and 0.02 for postnatal.
# PRENATAL
# Prepare model results for plot
dataset <- na.omit(reshape2::melt(cvResult_SBP_PRE$mse_fold))
LambdaVal <- round(apply(cvResult_SBP_PRE$lam_seq,2,mean),2)
LambdaVal <- paste(unique(dataset$Var2),"=",LambdaVal)

# Set range of colors for lambdas
fun_color_range <- colorRampPalette(c("#08737f", "#f95559"))   # Apply colorRampPalette
my_colors <- fun_color_range(50)  

# Boxplot showing the averaged MSE obtained for lambda values over folds
G3 <- ggplot(dataset,aes(x=Var1,y=value,fill=Var2))+geom_boxplot()+ labs(x = "Averaged lambda over folds",
y = "Mean squared error", fill='Lambda value') + scale_fill_manual(values=my_colors,labels = LambdaVal) + theme(legend.position = "none")+ ggtitle("SBP Prenatal")

# POSTNATAL
# Prepare model results for plot
dataset <- na.omit(reshape2::melt(cvResult_SBP_POS$mse_fold))
LambdaVal <- round(apply(cvResult_SBP_POS$lam_seq,2,mean),2)
LambdaVal <- paste(unique(dataset$Var2),"=",LambdaVal)

# Set range of colors for lambdas
fun_color_range <- colorRampPalette(c("#08737f", "#f95559"))   # Apply colorRampPalette
my_colors <- fun_color_range(50)  

# Boxplot showing the averaged MSE obtained for lambda values over folds
G4 <- ggplot(dataset,aes(x=Var1,y=value,fill=Var2))+geom_boxplot()+ labs(x = "Averaged lambda over folds",
y = "Mean squared error", fill='Lambda value') + scale_fill_manual(values=my_colors,labels = LambdaVal) + theme(legend.position = "none")+ ggtitle("SBP Postnatal")

ggarrange(G1, G2, G3, G4, ncol = 2, nrow = 2, labels=c("A","B","C","D"))

3.2 Optimal models: Diastolic Blood Pressure.

The optimal lambda was 0.13 for the prenatal and 0.02 for the postnatal period. Predictors 1 to 13 in the dataset were defined as the adjusting covariates.

# Solver of Lasso Regression
OptimalModel_DBP_PRE <-ds.LassoCov_Train(X=X_DBP_PRE, Y=Y_DBP,type = "regress", lambda=cvResult_DBP$lambda.min, covar=c(1:13), opts=opts, datasources=conns, nDigits=15)
names(OptimalModel_DBP_PRE$ws) <- ds.colnames("X_DBP_PRE")[[1]]
OptimalModel_DBP_POS <- ds.LassoCov_Train(X=X_DBP_POS, Y=Y_DBP,type = "regress", lambda=cvResult_DBP$lambda.min, covar=c(1:13), opts=opts, datasources=conns, nDigits=15)
names(OptimalModel_DBP_POS$ws) <- ds.colnames("X_DBP_POS")[[1]]

Below, the number and additional details for selected predictors in each model are shown.

# PRENATAL
# Get the number of selected variables
sum(OptimalModel_DBP_PRE$ws!=0)
[1] 13
# Show estimated coefficients and additional metadata for selected variables
toplot <- data.frame(OptimalModel_DBP_PRE$ws,ds.colnames("X_DBP_PRE")[[1]])
colnames(toplot) <- c("Coefficient","Index")
toplot_ <- codebook_vars[toplot[,2],c("Period","Group")]
toplot_save <- cbind(toplot,toplot_)
toplot_save <- toplot_save[which(toplot_save$Coefficient!=0),c(2,1,3,4)]
rownames(toplot_save) <- NULL
knitr::kable(toplot_save[order(abs(toplot_save$Coefficient),decreasing=T),], caption = "DBP Prenatal Optimal model") %>% row_spec(0,bold=TRUE) %>% kable_styling()
DBP Prenatal Optimal model
Index Coefficient Period Group
1 h_cohort_BIB 0.4702208 Pregnancy Key Covariates
4 h_cohort_RHEA -0.3599177 Pregnancy Key Covariates
3 h_cohort_MOBA -0.1713993 Pregnancy Key Covariates
5 h_cohort_INMA 0.1677149 Pregnancy Key Covariates
10 h_native_None -0.0888654 Pregnancy Key Covariates
13 hs_c_height_None 0.0883577 Postnatal Key Covariates
6 h_age_None 0.0602509 Pregnancy Key Covariates
8 h_mbmi_None 0.0496987 Pregnancy Key Covariates
7 h_edumc_None -0.0483261 Pregnancy Key Covariates
12 hs_child_age_days_None 0.0466813 Postnatal Key Covariates
9 h_parity_None -0.0406308 Pregnancy Key Covariates
2 h_cohort_KANC -0.0338111 Pregnancy Key Covariates
11 e3_sex_None -0.0195808 Pregnancy Key Covariates
toplot$vargroup <- rep(NA,nrow(toplot))
toplot$vargroup[which(toplot$Coefficient > 0)] <- "Positive"
toplot$vargroup[which(toplot$Coefficient == 0)] <- "Null"
toplot$vargroup[which(toplot$Coefficient < 0)] <- "Negative"

G5 <- ggplot(toplot, aes(x=Index, y=Coefficient, group = vargroup, label=Index)) +
  geom_point(size=2,aes(color = vargroup)) +
  scale_color_manual(values=c("#CD534CFF","black","#227CAD")) +
  geom_hline(yintercept=0, linetype='dashed', col = "#868686FF") +
  geom_label_repel(data         = subset(toplot, toplot$vargroup!="Null" ),
                   size          = 2,
                   box.padding   = 0.5,
                   point.padding = 0.5,
                   force         = 100,
                   segment.size  = 0.2,
                   segment.color = "grey50", max.overlaps = 8
                   ) +
  theme(text = element_text(size=6), axis.text.x = element_text(angle = 90),legend.position = "none") + ggtitle("DBP Prenatal Optimal model") + xlab("Exposures") + ylab("Coefficients")

# POSTNATAL
# Get the number of selected variables
sum(OptimalModel_DBP_POS$ws!=0)
[1] 43
# Show estimated coefficients and additional metadata for selected variables
toplot <- data.frame(OptimalModel_DBP_POS$ws,ds.colnames("X_DBP_POS")[[1]])
colnames(toplot) <- c("Coefficient","Index")
toplot_ <- codebook_vars[toplot[,2],c("Period","Group")]
toplot_save <- cbind(toplot,toplot_)
toplot_save <- toplot_save[which(toplot_save$Coefficient!=0),c(2,1,3,4)]
rownames(toplot_save) <- NULL
knitr::kable(toplot_save[order(abs(toplot_save$Coefficient),decreasing=T),], caption = "DBP Postnatal Optimal model") %>% row_spec(0,bold=TRUE) %>% 
kable_styling()
DBP Postnatal Optimal model
Index Coefficient Period Group
1 h_cohort_BIB 0.4518531 Pregnancy Key Covariates
4 h_cohort_RHEA -0.3514934 Pregnancy Key Covariates
5 h_cohort_INMA 0.1744509 Pregnancy Key Covariates
3 h_cohort_MOBA -0.1676391 Pregnancy Key Covariates
10 h_native_None -0.0866137 Pregnancy Key Covariates
13 hs_c_height_None 0.0756864 Postnatal Key Covariates
6 h_age_None 0.0661424 Pregnancy Key Covariates
28 hs_mg_c_Log2 0.0589452 Postnatal Essential minerals
20 hs_cu_c_Log2 0.0537502 Postnatal Metals
43 PSS_4_Score_None 0.0433349 Postnatal Others
12 hs_child_age_days_None 0.0396376 Postnatal Key Covariates
25 hs_mbzp_cadj_Log2 -0.0390572 Postnatal Phthalates
9 h_parity_None -0.0387779 Pregnancy Key Covariates
7 h_edumc_None -0.0380214 Pregnancy Key Covariates
31 hs_na_c_Log2 -0.0355180 Postnatal Essential minerals
27 hs_mep_cadj_Log2 -0.0347788 Postnatal Phthalates
8 h_mbmi_None 0.0325115 Pregnancy Key Covariates
21 hs_dde_cadj_Log2 -0.0315272 Postnatal OCs
2 h_cohort_KANC -0.0313477 Pregnancy Key Covariates
41 hs_tm_mt_hs_h_None -0.0302424 Postnatal Meteorological
16 hs_as_c_Log2 -0.0288183 Postnatal Metals
17 hs_bupa_cadj_Log2 -0.0270073 Postnatal Phenols
11 e3_sex_None -0.0236309 Pregnancy Key Covariates
35 hs_pbde47_cadj_Log2 0.0209138 Postnatal PBDEs
34 hs_pbde153_cadj_Log2 -0.0202860 Postnatal PBDEs
40 hs_pm25abs_yr_hs_h_Log 0.0145727 Postnatal Air Pollution
42 hs_trcs_cadj_Log2 0.0132693 Postnatal Phenols
39 hs_pfos_c_Log2 0.0116524 Postnatal PFASs
22 hs_dmp_cadj_Log2 -0.0110715 Postnatal OP Pesticides
26 hs_mehp_cadj_Log2 -0.0107084 Postnatal Phthalates
32 hs_oxominp_cadj_Log2 0.0104273 Postnatal Phthalates
15 hs_accesspoints300_h_Log -0.0086248 Postnatal Built Environment
14 h_Absorbance_Log 0.0081270 Postnatal Indoor air
37 hs_pcb138_cadj_Log2 -0.0069165 Postnatal OCs
23 hs_etpa_cadj_Log2 0.0063827 Postnatal Phenols
29 hs_mo_c_Log2 -0.0032549 Postnatal Metals
19 hs_co_c_Log2 -0.0030013 Postnatal Metals
18 hs_cd_c_Log2 0.0027984 Postnatal Metals
36 hs_pcb118_cadj_Log2 -0.0024705 Postnatal OCs
30 hs_mvpa_prd_alt_None 0.0015177 Postnatal Lifestyle
24 hs_k_c_Log2 0.0012254 Postnatal Essential minerals
38 hs_pcb170_cadj_Log2 -0.0004701 Postnatal OCs
33 hs_pb_c_Log2 -0.0001753 Postnatal Metals
toplot$vargroup <- rep(NA,nrow(toplot))
toplot$vargroup[which(toplot$Coefficient > 0)] <- "Positive"
toplot$vargroup[which(toplot$Coefficient == 0)] <- "Null"
toplot$vargroup[which(toplot$Coefficient < 0)] <- "Negative"

G6 <- ggplot(toplot, aes(x=Index, y=Coefficient, group = vargroup, label=Index)) +
  geom_point(size=2,aes(color = vargroup)) +
  scale_color_manual(values=c("#CD534CFF","black","#227CAD")) +
  geom_hline(yintercept=0, linetype='dashed', col = "#868686FF") +
  geom_label_repel(data         = subset(toplot, toplot$Coefficient > quantile(abs(toplot$Coefficient[toplot$Coefficient>0]),0.15) | toplot$Coefficient < -quantile(abs(toplot$Coefficient[toplot$Coefficient<0]) ,0.45)),
                   size          = 2,
                   box.padding   = 0.5,
                   point.padding = 0.5,
                   force         = 100,
                   segment.size  = 0.2,
                   segment.color = "grey50", max.overlaps = 8
                   ) +
  theme(text = element_text(size=6), axis.text.x = element_text(angle = 90),legend.position = "none") + ggtitle("DBP Postnatal Optimal model") + xlab("Exposures") + ylab("Coefficients")


ggarrange(G5, G6, ncol = 2, labels=c("A","B"))

The number of predictors remaining in the final models was 13 (only confounders) for the prenatal period, and 43 (13 confounders + 30 exposures) for the postnatal period. Intriguingly, in the prenatal period, a model with only adjusting covariates was enough to explain all the variance in DBP. On the other hand, in the postnatal period, the model evidenced how exposure to metals such as copper, or chemicals like PFAS and phenols was associated with higher blood pressure levels in children. Likewise, exposure to a high-stress environment (measured as perceived stress score) was associated with higher DBP. Interestingly, some urban environment factors such as the accesibility to public transport were evidenced as protective factors for high blood pressure. Among all exposures remaining in the final model as predictors of DBP, copper blood levels were one of the (non-confounder) exposures showing a higher effect size (B=0.05). Interestingly, many of these findings, especially those related to copper exposure or the effect of the urban environment on blood pressure levels were already described in our previous publications (Warembourg et al. 2019) and (Warembourg et al. 2021).

3.3 Optimal models: Systolic Blood Pressure.

The optimal lambda was 0.05 for the prenatal and 0.02 for the postnatal period. Predictors 1 to 13 in the dataset were defined as the adjusting covariates.

# Solver of Lasso Regression
OptimalModel_SBP_PRE <- ds.LassoCov_Train(X=X_SBP_PRE, Y=Y_SBP,type = "regress", lambda=cvResult_SBP$lambda.min, covar=c(1:13),  opts=opts, datasources=conns, nDigits=15)
names(OptimalModel_SBP_PRE$w) <- ds.colnames("X_SBP_PRE")[[1]]
OptimalModel_SBP_POS <- ds.LassoCov_Train(X=X_SBP_POS, Y=Y_SBP,type = "regress", lambda=cvResult_SBP$lambda.min, covar=c(1:13),  opts=opts, datasources=conns, nDigits=15)
names(OptimalModel_SBP_POS$w) <- ds.colnames("X_SBP_POS")[[1]]

Below, the number and additional details for selected predictors in each model are shown.

# PRENATAL
# Get the number of selected variables
sum(OptimalModel_SBP_PRE$ws!=0)
[1] 14
# Show estimated coefficients and additional metadata for selected variables
toplot <- data.frame(OptimalModel_SBP_PRE$ws,ds.colnames("X_SBP_PRE")[[1]])
colnames(toplot) <- c("Coefficient","Index")
toplot_ <- codebook_vars[toplot[,2],c("Period","Group")]
toplot_save <- cbind(toplot,toplot_)
toplot_save <- toplot_save[which(toplot_save$Coefficient!=0),c(2,1,3,4)]
rownames(toplot_save) <- NULL
knitr::kable(toplot_save[order(abs(toplot_save$Coefficient),decreasing = T),], caption = "SBP Prenatal Optimal model") %>% row_spec(0,bold=TRUE) %>% 
kable_styling()
SBP Prenatal Optimal model
Index Coefficient Period Group
1 h_cohort_BIB 0.5046683 Pregnancy Key Covariates
13 hs_c_height_None 0.3476911 Postnatal Key Covariates
2 h_cohort_KANC -0.3461856 Pregnancy Key Covariates
5 h_cohort_INMA 0.2611074 Pregnancy Key Covariates
3 h_cohort_MOBA -0.2104300 Pregnancy Key Covariates
4 h_cohort_RHEA -0.1598021 Pregnancy Key Covariates
6 h_age_None 0.0788615 Pregnancy Key Covariates
12 hs_child_age_days_None 0.0784751 Postnatal Key Covariates
8 h_mbmi_None 0.0772208 Pregnancy Key Covariates
10 h_native_None -0.0583598 Pregnancy Key Covariates
7 h_edumc_None -0.0477134 Pregnancy Key Covariates
11 e3_sex_None 0.0385127 Pregnancy Key Covariates
14 h_fdensity300_preg_Log -0.0198690 Pregnancy Built Environment
9 h_parity_None -0.0110181 Pregnancy Key Covariates
toplot$vargroup <- rep(NA,nrow(toplot))
toplot$vargroup[which(toplot$Coefficient > 0)] <- "Positive"
toplot$vargroup[which(toplot$Coefficient == 0)] <- "Null"
toplot$vargroup[which(toplot$Coefficient < 0)] <- "Negative"

G7 <- ggplot(toplot, aes(x=Index, y=Coefficient, group = vargroup, label=Index)) +
  geom_point(size=2,aes(color = vargroup)) +
  scale_color_manual(values=c("#CD534CFF","black","#227CAD")) +
  geom_hline(yintercept=0, linetype='dashed', col = "#868686FF") +
  geom_label_repel(data         = subset(toplot, toplot$vargroup!="Null" ),
                   size          = 2,
                   box.padding   = 0.5,
                   point.padding = 0.5,
                   force         = 100,
                   segment.size  = 0.2,
                   segment.color = "grey50", max.overlaps = 8
                   ) +
  theme(text = element_text(size=6), axis.text.x = element_text(angle = 90),legend.position = "none") + ggtitle("SBP Prenatal Optimal model") + xlab("Exposures") + ylab("Coefficients")

# POSTNATAL
# Get the number of selected variables
sum(OptimalModel_SBP_POS$ws!=0)
[1] 39
# Show estimated coefficients and additional metadata for selected variables
toplot <- data.frame(OptimalModel_SBP_POS$ws,ds.colnames("X_SBP_POS")[[1]])
colnames(toplot) <- c("Coefficient","Index")
toplot_ <- codebook_vars[toplot[,2],c("Period","Group")]
toplot_save <- cbind(toplot,toplot_)
toplot_save <- toplot_save[which(toplot_save$Coefficient!=0),c(2,1,3,4)]
rownames(toplot_save) <- NULL
knitr::kable(toplot_save[order(abs(toplot_save$Coefficient),decreasing = T),], caption = "SBP Postnatal Optimal model") %>% row_spec(0,bold=TRUE) %>% 
kable_styling()
SBP Postnatal Optimal model
Index Coefficient Period Group
1 h_cohort_BIB 0.4363117 Pregnancy Key Covariates
13 hs_c_height_None 0.3183230 Postnatal Key Covariates
2 h_cohort_KANC -0.2730910 Pregnancy Key Covariates
5 h_cohort_INMA 0.2488328 Pregnancy Key Covariates
3 h_cohort_MOBA -0.2245687 Pregnancy Key Covariates
4 h_cohort_RHEA -0.1375123 Pregnancy Key Covariates
6 h_age_None 0.0887953 Pregnancy Key Covariates
21 hs_dde_cadj_Log2 -0.0750493 Postnatal OCs
10 h_native_None -0.0669917 Pregnancy Key Covariates
8 h_mbmi_None 0.0607844 Pregnancy Key Covariates
12 hs_child_age_days_None 0.0564032 Postnatal Key Covariates
31 hs_mg_c_Log2 0.0531831 Postnatal Essential minerals
11 e3_sex_None 0.0486127 Pregnancy Key Covariates
27 hs_hcb_cadj_Log2 -0.0481409 Postnatal OCs
39 PSS_4_Score_None 0.0380141 Postnatal Others
28 hs_mbzp_cadj_Log2 -0.0281945 Postnatal Phthalates
17 hs_as_c_Log2 -0.0265266 Postnatal Metals
33 hs_pbde153_cadj_Log2 -0.0260163 Postnatal PBDEs
37 hs_pfunda_c_Log2 0.0239332 Postnatal PFASs
7 h_edumc_None -0.0232285 Pregnancy Key Covariates
24 hs_dmtp_cadj_Log2 -0.0217748 Postnatal OP Pesticides
32 hs_na_c_Log2 -0.0208633 Postnatal Essential minerals
14 h_Absorbance_Log 0.0197715 Postnatal Indoor air
29 hs_mehp_cadj_Log2 -0.0187647 Postnatal Phthalates
36 hs_pfoa_c_Log2 0.0108690 Postnatal PFASs
20 hs_cu_c_Log2 0.0105436 Postnatal Metals
34 hs_pbde47_cadj_Log2 0.0082825 Postnatal PBDEs
9 h_parity_None -0.0080572 Pregnancy Key Covariates
30 hs_mep_cadj_Log2 -0.0080463 Postnatal Phthalates
16 hs_accesspoints300_h_Log -0.0071812 Postnatal Built Environment
38 hs_pm25abs_yr_hs_h_Log 0.0068807 Postnatal Air Pollution
15 h_Benzene_Log 0.0058062 Postnatal Indoor air
18 hs_bupa_cadj_Log2 -0.0049075 Postnatal Phenols
23 hs_dmp_cadj_Log2 -0.0034749 Postnatal OP Pesticides
26 hs_frichness300_h_None -0.0032700 Postnatal Built Environment
35 hs_pfna_c_Log2 0.0022736 Postnatal PFASs
25 hs_etpa_cadj_Log2 0.0019301 Postnatal Phenols
22 hs_dep_cadj_Log2 -0.0019300 Postnatal OP Pesticides
19 hs_cd_c_Log2 0.0000674 Postnatal Metals
toplot$vargroup <- rep(NA,nrow(toplot))
toplot$vargroup[which(toplot$Coefficient > 0)] <- "Positive"
toplot$vargroup[which(toplot$Coefficient == 0)] <- "Null"
toplot$vargroup[which(toplot$Coefficient < 0)] <- "Negative"

G8 <- ggplot(toplot, aes(x=Index, y=Coefficient, group = vargroup, label=Index)) +
  geom_point(size=2,aes(color = vargroup)) +
  scale_color_manual(values=c("#CD534CFF","black","#227CAD")) +
  geom_hline(yintercept=0, linetype='dashed', col = "#868686FF") +
  geom_label_repel(data         = subset(toplot, toplot$Coefficient > quantile(abs(toplot$Coefficient[toplot$Coefficient>0]),0.15) | toplot$Coefficient < -quantile(abs(toplot$Coefficient[toplot$Coefficient<0]) ,0.45)),
                   size          = 2,
                   box.padding   = 0.5,
                   point.padding = 0.5,
                   force         = 100,
                   segment.size  = 0.2,
                   segment.color = "grey50", max.overlaps = 8
                   ) +
  theme(text = element_text(size=6), axis.text.x = element_text(angle = 90),legend.position = "none") + ggtitle("SBP Postnatal Optimal model") + xlab("Exposures") + ylab("Coefficients")

ggarrange(G7, G8, ncol = 2, labels=c("A","B"))

The number of predictors remaining in the final models was 14 (13 confounders + 1 exposure) for the prenatal period, and 39 (13 confounders + 26 exposures) for the postnatal period. Interestingly, a higher number of facility types near children’s residences in both pregnancy and postnatal periods was associated with lower SBP values (this was already observed by (Warembourg et al. 2021)). For the postnatal period, results were pretty concordant with those extracted for DBP, remaining the most interesting associations, such as those revealed for high-stress environment or copper blood levels.

In both cases (DBP and SBP), unadjusted models revealed similar findings to adjusted models in terms of selected variables and direction of associations. Nevertheless, in comparison to dsLassoCov, the uncorrected dsLasso approach yielded less sparse (and therefore less interpretable) models as well as selected some unexpected variables with extremely high effect sizes and underestimated the effect of some other exposures, such as those identified as key confounders. For example, in the case of the postnatal SBP unadjusted model, the blood levels of hexachlorobenzene were selected as one of the exposures with the highest effect sizes, even above key known-exposures such as the cohort of recruitment or the maternal BMI. Likewise, unadjusted models evidenced unexpected associations such as an inverse proportional relationship between the total traffic load of all roads in 100 m buffer at home and SBP. This is an unexpected finding since previous studies have shown how noise and air pollution in cities is a well-known trigger of high blood pressure and cardiovascular disease.

# Get the number of selected variables
sum(OptimalModel_SBP_POS_nocovs$w!=0)
[1] 49
# Show estimated coefficients and additional metadata for selected variables
toplot <- data.frame(OptimalModel_SBP_POS_nocovs$w,ds.colnames("X_SBP_POS")[[1]])
colnames(toplot) <- c("Coefficient","Index")
toplot_ <- codebook_vars[toplot[,2],c("Period","Group")]
toplot_save <- cbind(toplot,toplot_)
toplot_save <- toplot_save[which(toplot_save$Coefficient!=0),c(2,1,3,4)]
rownames(toplot_save) <- NULL
knitr::kable(toplot_save[order(abs(toplot_save$Coefficient),decreasing=T),], caption = "SBP Postnatal Unadjusted Optimal model") %>% row_spec(0,bold=TRUE) %>% 
kable_styling()
SBP Postnatal Unadjusted Optimal model
Index Coefficient Period Group
9 hs_c_height_None 0.2287006 Postnatal Key Covariates
25 hs_hcb_cadj_Log2 -0.1448063 Postnatal OCs
19 hs_dde_cadj_Log2 -0.0938842 Postnatal OCs
8 hs_child_age_days_None 0.0751858 Postnatal Key Covariates
6 h_native_None -0.0684704 Pregnancy Key Covariates
32 hs_mg_c_Log2 0.0583768 Postnatal Essential minerals
3 h_age_None 0.0561176 Pregnancy Key Covariates
49 PSS_4_Score_None 0.0538807 Postnatal Others
5 h_mbmi_None 0.0518649 Pregnancy Key Covariates
28 hs_mbzp_cadj_Log2 -0.0473453 Postnatal Phthalates
37 hs_no2_yr_hs_h_Log 0.0452616 Postnatal Air Pollution
45 hs_pm25abs_yr_hs_h_Log 0.0404114 Postnatal Air Pollution
42 hs_pfoa_c_Log2 0.0397292 Postnatal PFASs
11 h_NO2_Log 0.0369238 Postnatal Indoor air
7 e3_sex_None 0.0339399 Pregnancy Key Covariates
13 hs_accesspoints300_h_Log 0.0242645 Postnatal Built Environment
47 hs_trafload_h_pow1over3 -0.0226582 Postnatal Traffic
10 h_Absorbance_Log 0.0197312 Postnatal Indoor air
40 hs_pbde47_cadj_Log2 0.0185071 Postnatal PBDEs
39 hs_pbde153_cadj_Log2 -0.0176491 Postnatal PBDEs
18 hs_cu_c_Log2 0.0174570 Postnatal Metals
20 hs_dmp_cadj_Log2 -0.0173723 Postnatal OP Pesticides
4 h_edumc_None -0.0156090 Pregnancy Key Covariates
34 hs_mnbp_cadj_Log2 -0.0151162 Postnatal Phthalates
30 hs_mehp_cadj_Log2 -0.0145265 Postnatal Phthalates
26 hs_hg_c_Log2 0.0142944 Postnatal Metals
15 hs_bupa_cadj_Log2 -0.0138145 Postnatal Phenols
24 hs_frichness300_h_None -0.0137729 Postnatal Built Environment
21 hs_dmtp_cadj_Log2 -0.0135055 Postnatal OP Pesticides
27 hs_landuseshan300_h_None 0.0108409 Postnatal Built Environment
22 hs_etpa_cadj_Log2 0.0100900 Postnatal Phenols
31 hs_mepa_cadj_Log2 0.0099626 Postnatal Phenols
14 hs_as_c_Log2 -0.0098313 Postnatal Metals
46 hs_se_c_Log2 0.0088805 Postnatal Essential minerals
36 hs_ndvi100_h_None -0.0082579 Postnatal Natural Spaces
43 hs_pfunda_c_Log2 0.0063179 Postnatal PFASs
35 hs_na_c_Log2 -0.0042086 Postnatal Essential minerals
33 hs_mibp_cadj_Log2 0.0040371 Postnatal Phthalates
41 hs_pfhxs_c_Log2 0.0028548 Postnatal PFASs
2 h_cohort_MOBA -0.0025974 Pregnancy Key Covariates
1 h_cohort_BIB 0.0025684 Pregnancy Key Covariates
38 hs_ohminp_cadj_Log2 0.0011654 Postnatal Phthalates
17 hs_cs_c_Log2 0.0007918 Postnatal Metals
44 hs_pm25_yr_hs_h_None 0.0007645 Postnatal Air Pollution
48 hs_uvdvf_mt_hs_h_None 0.0006950 Postnatal Meteorological
12 h_TEX_Log -0.0006882 Postnatal Indoor air
23 hs_fdensity300_h_Log -0.0003913 Postnatal Built Environment
16 hs_cd_c_Log2 0.0000234 Postnatal Metals
29 hs_mecpp_cadj_Log2 -0.0000119 Postnatal Phthalates

4 Conclusion.

With this showcase, we thereby demonstrate that our method dsLassoCov and the DataSHIELD infrastructure are powerful tools for performing feature selection in complex scenarios such as the case of exposome research datasets, in which confounding effects can be strong biasing research findings. The fact of two papers already published with a similar approach showing coincident results ((Warembourg et al. 2019) and (Warembourg et al. 2021)) also add robustness to our statements. Specially in the case of sensitive multi-centre biomedical research studies, the use of federated-analysis platforms such as DataSHIELD should be promoted.

datashield.logout(conns)

5 Session Info.

sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=es_ES.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=es_ES.UTF-8        LC_COLLATE=es_ES.UTF-8    
 [5] LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=es_ES.UTF-8   
 [7] LC_PAPER=es_ES.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] ggpubr_0.4.0           lattice_0.20-45        gridExtra_2.3         
 [4] kableExtra_1.3.4       RColorBrewer_1.1-3     reshape2_1.4.4        
 [7] forcats_0.5.2          stringr_1.4.0          purrr_0.3.4           
[10] readr_2.1.2            tidyr_1.2.1            tibble_3.1.5          
[13] tidyverse_1.3.1        ggrepel_0.9.1          ggplot2_3.3.6         
[16] dplyr_1.0.10           dsHelper_0.4.18.9000   dsMTLClient_0.9       
[19] corpcor_1.6.9          dsExposomeClient_2.0.4 dsBaseClient_6.1.1    
[22] DSOpal_1.3.1           DSI_1.4.0              R6_2.5.1              
[25] progress_1.2.2         opalr_3.1.1            httr_1.4.4            

loaded via a namespace (and not attached):
 [1] nlme_3.1-159      fs_1.5.2          lubridate_1.8.0   webshot_0.5.3    
 [5] tools_4.2.0       backports_1.2.1   bslib_0.4.0       utf8_1.2.2       
 [9] metafor_3.8-1     DBI_1.1.2         colorspace_2.0-3  withr_2.5.0      
[13] tidyselect_1.1.2  prettyunits_1.1.1 curl_4.3.2        compiler_4.2.0   
[17] cli_3.4.0         rvest_1.0.2       xml2_1.3.3        labeling_0.4.2   
[21] sass_0.4.2        scales_1.2.1      systemfonts_1.0.4 digest_0.6.28    
[25] rmarkdown_2.16    svglite_2.1.0     pkgconfig_2.0.3   htmltools_0.5.3  
[29] labelled_2.9.1    dbplyr_2.2.1      fastmap_1.1.0     highr_0.9        
[33] rlang_1.0.5       readxl_1.4.1      rstudioapi_0.14   farver_2.1.1     
[37] jquerylib_0.1.4   generics_0.1.2    jsonlite_1.7.2    car_3.1-0        
[41] magrittr_2.0.1    metadat_1.2-0     Matrix_1.4-1      Rcpp_1.0.9       
[45] munsell_0.5.0     fansi_0.5.0       abind_1.4-5       lifecycle_1.0.1  
[49] stringi_1.7.5     yaml_2.3.5        carData_3.0-4     mathjaxr_1.6-0   
[53] plyr_1.8.7        crayon_1.4.1      cowplot_1.1.1     haven_2.5.1      
[57] hms_1.1.2         knitr_1.40        pillar_1.8.1      ggsignif_0.6.3   
[61] codetools_0.2-18  reprex_2.0.2      glue_1.6.2        evaluate_0.16    
[65] modelr_0.1.9      vctrs_0.4.1       tzdb_0.3.0        cellranger_1.1.0 
[69] gtable_0.3.0      assertthat_0.2.1  cachem_1.0.6      xfun_0.32        
[73] mime_0.12         broom_1.0.1       rstatix_0.7.0     viridisLite_0.4.1
[77] ellipsis_0.3.2   

References.

Warembourg, Charline, Léa Maitre, Ibon Tamayo-Uria, Serena Fossati, Theano Roumeliotaki, Gunn Marit Aasvang, Sandra Andrusaityte, et al. 2019. “Early-Life Environmental Exposures and Blood Pressure in Children.” Journal of the American College of Cardiology 74 (10): 1317–28. https://doi.org/https://doi.org/10.1016/j.jacc.2019.06.069.
Warembourg, Charline, Mark Nieuwenhuijsen, Ferran Ballester, Montserrat de Castro, Leda Chatzi, Ana Esplugues, Barbara Heude, et al. 2021. “Urban Environment During Early-Life and Blood Pressure in Young Children.” Environ. Int. 146 (106174): 106174.