1.

Revisit how were are thinking about engagement and interest. For now, let’s just look at a correlation matrix between the ESM variables concentration, hard work, enjoyment, and interest. As all of these could potentially be in an engagement measure, let’s look at what the measure would look like with just conc & work vs. conc, work and interest, vs conc work & enj. Vs. all 4. Also, is there any rationale for putting interest and enjoyment together and having this separate from the conc & work pair??? Here I just want us to understand our data better . What happens when all these are put in a factor analysis?

Presently, here’s how our engagement measures are made:

Overall engagement: Hard working, concentrating, enjoying
Cognitive engagement: Important to future goals, important to you
Affective engagement: Enjoying, interesting
Behavioral engagement: Hard working, concentrating

Let’s look at just four variables, concentrating, hard working, enjoying, and interesting

Correlation matrix

ss <- esm %>% 
    dplyr::select(concentrating, hard_working, enjoy, interest)

ss %>% 
    correlate() %>% 
    shave() %>% 
    fashion()

##         rowname concentrating hard_working enjoy interest
## 1 concentrating                                          
## 2  hard_working           .62                            
## 3         enjoy           .59          .57               
## 4      interest           .57          .54   .66

Factor analysis

This seems to suggest there is one factor (two does not seem to fit):

Note that MR1 represents the factor loadings

determine_n_factors <- function(d) {
    require(nFactors)
    d <- d[complete.cases(d), ]
    ev <- eigen(cor(d)) # get eigenvalues
    ap <- parallel(subject=nrow(d),var=ncol(d),
                   rep=100,cent=.05)
    nS <- nScree(x=ev$values, aparallel=ap$eigen$qevpea)
    plotnScree(nS)
}

determine_n_factors(ss)

ssc <- ss[complete.cases(ss), ]
library(psych)
fit1 <- fa(ssc, nfactors=1, rotation="Promax", fm = "pa")
fit1

## Factor Analysis using method =  pa
## Call: fa(r = ssc, nfactors = 1, fm = "pa", rotation = "Promax")
## Standardized loadings (pattern matrix) based upon correlation matrix
##                PA1   h2   u2 com
## concentrating 0.78 0.60 0.40   1
## hard_working  0.74 0.55 0.45   1
## enjoy         0.80 0.64 0.36   1
## interest      0.77 0.59 0.41   1
## 
##                 PA1
## SS loadings    2.37
## Proportion Var 0.59
## 
## Mean item complexity =  1
## Test of the hypothesis that 1 factor is sufficient.
## 
## The degrees of freedom for the null model are  6  and the objective function was  1.72 with Chi Square of  5073.69
## The degrees of freedom for the model are 2  and the objective function was  0.04 
## 
## The root mean square of the residuals (RMSR) is  0.03 
## The df corrected root mean square of the residuals is  0.06 
## 
## The harmonic number of observations is  2957 with the empirical chi square  40.13  with prob <  1.9e-09 
## The total number of observations was  2957  with Likelihood Chi Square =  113.95  with prob <  1.8e-25 
## 
## Tucker Lewis Index of factoring reliability =  0.934
## RMSEA index =  0.138  and the 90 % confidence intervals are  0.117 0.16
## BIC =  97.97
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    PA1
## Correlation of (regression) scores with factors   0.92
## Multiple R square of scores with factors          0.85
## Minimum correlation of possible factor scores     0.71

# fit2 <- fa(ssc, nfactors=2, rotation="Promax", fm = "pa")
# fit2

Reliability

psych::alpha(ssc)

## 
## Reliability analysis   
## Call: psych::alpha(x = ssc)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N    ase mean   sd
##       0.85      0.85    0.82      0.59 5.8 0.0044  2.9 0.87
## 
##  lower alpha upper     95% confidence boundaries
## 0.84 0.85 0.86 
## 
##  Reliability if an item is dropped:
##               raw_alpha std.alpha G6(smc) average_r S/N alpha se
## concentrating      0.81      0.81    0.75      0.59 4.3   0.0060
## hard_working       0.82      0.82    0.76      0.61 4.7   0.0056
## enjoy              0.80      0.80    0.74      0.58 4.1   0.0063
## interest           0.81      0.82    0.75      0.60 4.4   0.0059
## 
##  Item statistics 
##                  n raw.r std.r r.cor r.drop mean  sd
## concentrating 2957  0.83  0.84  0.76   0.70  2.9 1.0
## hard_working  2957  0.82  0.82  0.73   0.67  2.9 1.0
## enjoy         2957  0.85  0.85  0.78   0.72  2.8 1.1
## interest      2957  0.83  0.83  0.75   0.69  2.9 1.1
## 
## Non missing response frequency for each item
##                  1   2    3    4 miss
## concentrating 0.12 0.2 0.33 0.35    0
## hard_working  0.14 0.2 0.31 0.35    0
## enjoy         0.15 0.2 0.31 0.33    0
## interest      0.14 0.2 0.30 0.37    0

Let’s look at all six variables, concentrating, hard working, enjoying, interesting, important to goals, and important to you

ss1 <- esm %>% 
    dplyr::select(concentrating, hard_working, enjoy, interest, important, future_goals)

ss1 %>% 
    correlate() %>% 
    shave() %>% 
    fashion()

##         rowname concentrating hard_working enjoy interest important
## 1 concentrating                                                    
## 2  hard_working           .62                                      
## 3         enjoy           .59          .57                         
## 4      interest           .57          .54   .66                   
## 5     important           .50          .49   .54      .56          
## 6  future_goals           .41          .44   .49      .51       .62
##   future_goals
## 1             
## 2             
## 3             
## 4             
## 5             
## 6

determine_n_factors(ss1)

ssc <- ss1[complete.cases(ss1), ]
library(psych)
fit1 <- fa(ssc, nfactors=1, rotation="Promax", fm = "pa")
fit1

## Factor Analysis using method =  pa
## Call: fa(r = ssc, nfactors = 1, fm = "pa", rotation = "Promax")
## Standardized loadings (pattern matrix) based upon correlation matrix
##                PA1   h2   u2 com
## concentrating 0.74 0.54 0.46   1
## hard_working  0.72 0.52 0.48   1
## enjoy         0.79 0.62 0.38   1
## interest      0.78 0.61 0.39   1
## important     0.73 0.54 0.46   1
## future_goals  0.66 0.43 0.57   1
## 
##                 PA1
## SS loadings    3.26
## Proportion Var 0.54
## 
## Mean item complexity =  1
## Test of the hypothesis that 1 factor is sufficient.
## 
## The degrees of freedom for the null model are  15  and the objective function was  2.81 with Chi Square of  8302.02
## The degrees of freedom for the model are 9  and the objective function was  0.17 
## 
## The root mean square of the residuals (RMSR) is  0.05 
## The df corrected root mean square of the residuals is  0.07 
## 
## The harmonic number of observations is  2956 with the empirical chi square  255.92  with prob <  5.6e-50 
## The total number of observations was  2956  with Likelihood Chi Square =  510.05  with prob <  4.1e-104 
## 
## Tucker Lewis Index of factoring reliability =  0.899
## RMSEA index =  0.137  and the 90 % confidence intervals are  0.127 0.148
## BIC =  438.12
## Fit based upon off diagonal values = 0.99
## Measures of factor score adequacy             
##                                                    PA1
## Correlation of (regression) scores with factors   0.94
## Multiple R square of scores with factors          0.88
## Minimum correlation of possible factor scores     0.76

fit1 <- fa(ssc, nfactors=2, fm = "pa")
fit1

## Factor Analysis using method =  pa
## Call: fa(r = ssc, nfactors = 2, fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
##                 PA1   PA2   h2   u2 com
## concentrating  0.87 -0.10 0.64 0.36 1.0
## hard_working   0.74  0.02 0.56 0.44 1.0
## enjoy          0.67  0.15 0.62 0.38 1.1
## interest       0.57  0.25 0.60 0.40 1.4
## important      0.14  0.69 0.64 0.36 1.1
## future_goals  -0.04  0.81 0.61 0.39 1.0
## 
##                        PA1  PA2
## SS loadings           2.26 1.40
## Proportion Var        0.38 0.23
## Cumulative Var        0.38 0.61
## Proportion Explained  0.62 0.38
## Cumulative Proportion 0.62 1.00
## 
##  With factor correlations of 
##      PA1  PA2
## PA1 1.00 0.74
## PA2 0.74 1.00
## 
## Mean item complexity =  1.1
## Test of the hypothesis that 2 factors are sufficient.
## 
## The degrees of freedom for the null model are  15  and the objective function was  2.81 with Chi Square of  8302.02
## The degrees of freedom for the model are 4  and the objective function was  0.04 
## 
## The root mean square of the residuals (RMSR) is  0.02 
## The df corrected root mean square of the residuals is  0.04 
## 
## The harmonic number of observations is  2956 with the empirical chi square  34.14  with prob <  7e-07 
## The total number of observations was  2956  with Likelihood Chi Square =  104.57  with prob <  1e-21 
## 
## Tucker Lewis Index of factoring reliability =  0.954
## RMSEA index =  0.092  and the 90 % confidence intervals are  0.077 0.108
## BIC =  72.6
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    PA1  PA2
## Correlation of (regression) scores with factors   0.93 0.90
## Multiple R square of scores with factors          0.86 0.81
## Minimum correlation of possible factor scores     0.73 0.62

psych::alpha(ssc)

## 
## Reliability analysis   
## Call: psych::alpha(x = ssc)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N    ase mean   sd
##       0.88      0.88    0.87      0.54 7.1 0.0036  2.8 0.83
## 
##  lower alpha upper     95% confidence boundaries
## 0.87 0.88 0.88 
## 
##  Reliability if an item is dropped:
##               raw_alpha std.alpha G6(smc) average_r S/N alpha se
## concentrating      0.86      0.86    0.83      0.54 5.9   0.0042
## hard_working       0.86      0.86    0.84      0.55 6.0   0.0042
## enjoy              0.85      0.85    0.83      0.53 5.6   0.0045
## interest           0.85      0.85    0.83      0.53 5.6   0.0045
## important          0.85      0.85    0.83      0.54 5.9   0.0043
## future_goals       0.87      0.87    0.84      0.57 6.5   0.0039
## 
##  Item statistics 
##                  n raw.r std.r r.cor r.drop mean  sd
## concentrating 2956  0.78  0.79  0.73   0.68  2.9 1.0
## hard_working  2956  0.77  0.78  0.72   0.67  2.9 1.0
## enjoy         2956  0.81  0.82  0.77   0.72  2.8 1.1
## interest      2956  0.81  0.81  0.77   0.72  2.9 1.1
## important     2956  0.79  0.79  0.74   0.69  2.7 1.1
## future_goals  2956  0.74  0.74  0.66   0.61  2.5 1.1
## 
## Non missing response frequency for each item
##                  1    2    3    4 miss
## concentrating 0.12 0.20 0.33 0.35    0
## hard_working  0.14 0.20 0.31 0.35    0
## enjoy         0.15 0.20 0.31 0.33    0
## interest      0.14 0.20 0.30 0.37    0
## important     0.19 0.23 0.29 0.28    0
## future_goals  0.26 0.24 0.26 0.24    0

In terms of takeaways, it looks like we could plausibly add interest to overall engagement (concentrating, hard_working, and enjoy). Future goals has a lower factor loading when we consider a one factor solution; while we could not fit a two-factor solution with only four items, with all six, it appears that future goals and important to you do not load highly on the same factor as the other four, and do load on another factor.

2.

Can we still feel OK about measuring relevance as the mean of importance to you, importance to future and use outside of program? If the alpha for these 3 looks OK that’s enough for me here.

ss2 <- esm %>% 
    dplyr::select(important, future_goals, use_outside)

psych::alpha(ss2)

## 
## Reliability analysis   
## Call: psych::alpha(x = ss2)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N    ase mean   sd
##       0.84      0.84    0.78      0.63 5.2 0.0051  2.6 0.96
## 
##  lower alpha upper     95% confidence boundaries
## 0.83 0.84 0.85 
## 
##  Reliability if an item is dropped:
##              raw_alpha std.alpha G6(smc) average_r S/N alpha se
## important         0.79      0.79    0.65      0.65 3.8   0.0077
## future_goals      0.77      0.77    0.63      0.63 3.3   0.0085
## use_outside       0.77      0.77    0.62      0.62 3.3   0.0085
## 
##  Item statistics 
##                 n raw.r std.r r.cor r.drop mean  sd
## important    2969  0.86  0.86  0.75   0.69  2.7 1.1
## future_goals 2970  0.88  0.87  0.78   0.71  2.5 1.1
## use_outside  2957  0.88  0.87  0.78   0.71  2.6 1.1
## 
## Non missing response frequency for each item
##                 1    2    3    4 miss
## important    0.19 0.24 0.29 0.28    0
## future_goals 0.26 0.24 0.26 0.24    0
## use_outside  0.22 0.23 0.27 0.28    0

3.

Let’s try to construct some composite measures on the PQA. We can start with the things Neil suggested in his email, which I have pasted below: I think we should treat the PQA data in the following manner:

Youth Development Overall – Sum of the first 11 items in the tool associated with Active Participation, Higher Order Thinking, Belonging and Collaboration, and Opportunities for Agency. It seems this score is likely to be related to interest and enjoyment in particular, and engagement if predicated on interest, enjoyment, and concentration based in initial correlations.
Opportunities for Agency – This one warrants consideration by itself given past findings from both your work and mine on the relationship between autonomy and engagement.
Pursuit of Self Transcendent Goals – I think this is primarily going to be related to feelings of importance/relevance and will be explored in my dissertation. There are some agency items here too if we want to beef up #2.
STEM Content – We could start by summing the STEM content scores to see what we get, unless you all want to break these items down into different scales based on the type of activity taking place. I don’t have any strong feelings at the moment on how these best could be characterized and I don’t there is any precedent here to guide us in terms of how the tool has been used previously.

I also made codes for the aspects of data modeling, whereby if one of the PQA indicators for the aspects of data modeling was greater than one, than that code would be equal to one. These codes seem to suggest that students are being supported in doing these activities with some regularity. I am slightly concerned that data modeling is being considered too generally.

Here is that coding frame:

Asking questions: PQA code for Predict
Making observations: PQA code of Classification
Generating data: PQA code for Measure
Data modeling: PQA codes for analyze, model, or symbols (considering removing symbols and analyze)
Interpreting findings: PQA codes for symbols or analyze

pqa <- mutate(pqa, 
              youth_development_overall = active_part_1 + active_part_2 + ho_thinking_1 + ho_thinking_2 + ho_thinking_3 + belonging_1 + belonging_2 + agency_1 + agency_2 + agency_3 + agency_4,
              making_observations = stem_sb_8,
              data_modeling = stem_sb_2 + stem_sb_3 + stem_sb_9,
              interpreting_communicating = stem_sb_6,
              generating_data = stem_sb_4,
              asking_questions = stem_sb_1)

ggplot(pqa, aes(x = youth_development_overall)) +
    geom_histogram()

pqa %>% 
    dplyr::select(asking_questions, making_observations, generating_data, data_modeling, interpreting_communicating) %>% 
    gather(key, val) %>% 
    mutate(val = ifelse(val >= 1, 1, 0)) %>% 
    group_by(key) %>% 
    summarize(mean_code = mean(val)) %>% 
    arrange(desc(mean_code))

## # A tibble: 5 x 2
##                          key mean_code
##                        <chr>     <dbl>
## 1              data_modeling 0.5127119
## 2           asking_questions 0.3898305
## 3 interpreting_communicating 0.3771186
## 4        making_observations 0.2584746
## 5            generating_data 0.2245763

New vars

Joshua Rosenberg

11/9/2017

1.

Let’s look at just four variables, concentrating, hard working, enjoying, and interesting

Correlation matrix

Factor analysis

Reliability

Let’s look at all six variables, concentrating, hard working, enjoying, interesting, important to goals, and important to you

2.

3.