I used the preliminary NFL data from Suzanne. I only kept the most recent NFL reading for each individual so that I could use fixed effects models.

Quickly visualizing, we can see that people’s NFL levels increase as they age, and that people with higher NFL levels are more likely to convert.

I used expectation-maximization to do best subset selection. I used the R leaps package to search for the best possible combination of up to 6 variables that would correlate with the preliminary NFL data. The inputs that I considered were an individual’s age and each of their network composite scores.

After performing 10-fold cross-validation I found that the optimal linear model was below:

lm(Prelim_NFL ~ age_at_lp + SM_x_DMN + SM_lat_x_VIS + DMN_x_DMN, data = df.temp2)
## 
## Call:
## lm(formula = Prelim_NFL ~ age_at_lp + SM_x_DMN + SM_lat_x_VIS + 
##     DMN_x_DMN, data = df.temp2)
## 
## Coefficients:
##  (Intercept)     age_at_lp      SM_x_DMN  SM_lat_x_VIS     DMN_x_DMN  
##     -2343.16         69.24      -6942.24      -1997.79      -5956.31

Here is a visualization of each of these key components overlain on our actual data.

And here is a visualization of the correlations between each of these parameters.