I used the preliminary NFL data from Suzanne. I only kept the most recent NFL reading for each individual so that I could use fixed effects models.
Quickly visualizing, we can see that people’s NFL levels increase as they age, and that people with higher NFL levels are more likely to convert.
I used expectation-maximization to do best subset selection. I used the R leaps package to search for the best possible combination of up to 6 variables that would correlate with the preliminary NFL data. The inputs that I considered were an individual’s age and each of their network composite scores.
After performing 10-fold cross-validation I found that the optimal linear model was below:
lm(Prelim_NFL ~ age_at_lp + SM_x_DMN + SM_lat_x_VIS + DMN_x_DMN, data = df.temp2)
##
## Call:
## lm(formula = Prelim_NFL ~ age_at_lp + SM_x_DMN + SM_lat_x_VIS +
## DMN_x_DMN, data = df.temp2)
##
## Coefficients:
## (Intercept) age_at_lp SM_x_DMN SM_lat_x_VIS DMN_x_DMN
## -2343.16 69.24 -6942.24 -1997.79 -5956.31
Here is a visualization of each of these key components overlain on our actual data.
And here is a visualization of the correlations between each of these parameters.