This document outlines the initial analysis details of Study 1 of the STEM Research Experience Project, which is aimed at determining the nature and associations of different underlying components of STEM research experience.

Study 1 relies on archival data from the AScILS (Assessing Scientific Inquiry and Leadership Skills) project collected from alumni of undergraduate majors in STEM. We begin with this dataset because it is the largest and most variable. The purpose of this phase is as follows:

  1. To determine the underlying dimensionality of research experience.
  2. To test for invariance by gender, ethnicity, and major in the final factor structure.
  3. To map correlations between dimensions of research experience and participants’ ratings of identity as a scientist.

Study 2 will involve confirming the factor model in a sample of high school students and will include mentor ratings of identity.

Study 3 will involve confirming the factor model in a sample of current undergraduate students and will include both measures of identity and a scientific reasoning assessment.

Study 1 Data: The AScILS Undergraduate Alumni Study

The data for this project were previously published as Study 1 in:

Syed, M., Zurbriggen, E., Chemers, M. M., Goza, B. K., Bearman, S., Crosby, F., Shaw, J. M., Hunter, L., & Morgan, E. M. (2019). The role of self-efficacy and identity in mediating the effects of STEM support experiences. Analysis of Social Issues and Public Policy, 19(1), 7-49. https://doi.org/10.1111/asap.12170

Load the packages we will need:

library(haven)
library(labelled)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(summarytools)

The data file is “SciEngSurvey Alumni Retro MASTER.sav,” located in “~/Box/AScILS/Alumni Survey.” Set working directory to Box and run the following:

ascils_dat <- haven::read_sav("AScILS/Alumni Survey/SciEngSurvey Alumni Retro MASTER.sav")
dim(ascils_dat)
## [1] 502 374

RQ1: Determine the Dimensionality of Research Experience

These are the 19 variables assessing research experience, each rated on a 1-5 scale:

labelled::look_for((ascils_dat %>% dplyr::select(outcls1:outcls19)), details = FALSE)
##  pos variable label                                     
##   1  outcls1  Participated in Research / Eng Projects   
##   2  outcls2  Worked in Sci / Eng                       
##   3  outcls3  Member of Research / Eng Team             
##   4  outcls4  Played Leadership Role                    
##   5  outcls5  Generated Research Question / Eng Problem 
##   6  outcls6  Identified Questions                      
##   7  outcls7  Collected Data / Identified Constraints   
##   8  outcls8  Interpreted Data / Found Solutions        
##   9  outcls9  Explained Results / Evaluated Solution Fit
##  10  outcls10 Used Literature                           
##  11  outcls11 Related Results to Work of Others         
##  12  outcls12 Gave Presentation to Students             
##  13  outcls13 Gave Professional Presentation            
##  14  outcls14 Wrote Article                             
##  15  outcls15 Planned Research / Projects               
##  16  outcls16 Attended Lectures                         
##  17  outcls17 Attended Conferences                      
##  18  outcls18 Learned Technical Skills                  
##  19  outcls19 Learned Terminology

In Step 1, randomly select 60% of cases (302) and conduct exploratory factor analysis, determining the optimal factor structure.

In Step 2, confirm the finalized structure with CFA on remaining 200 cases. Any multi-dimensional solution should be compared against a one-factor model and any other plausible models.

RQ2: Assess Measurement Invariance

Starting with the finalized factor structure, assesss measurement invariance by gender (men/women), race/ethnicity (URM/Asian/White), and major (science/engineering).

summarytools::freq(ascils_dat$gennumpp)
## Frequencies  
## ascils_dat$gennumpp  
## Type: Numeric  
## 
##               Freq   % Valid   % Valid Cum.   % Total   % Total Cum.
## ----------- ------ --------- -------------- --------- --------------
##           1    249     57.51          57.51     49.60          49.60
##           2    184     42.49         100.00     36.65          86.25
##        <NA>     69                              13.75         100.00
##       Total    502    100.00         100.00    100.00         100.00
summarytools::freq(ascils_dat$ethnumpp)
## Frequencies  
## ascils_dat$ethnumpp  
## Type: Numeric  
## 
##               Freq   % Valid   % Valid Cum.   % Total   % Total Cum.
## ----------- ------ --------- -------------- --------- --------------
##           1     31      6.21           6.21      6.18           6.18
##           2     71     14.23          20.44     14.14          20.32
##           3     70     14.03          34.47     13.94          34.26
##           4    155     31.06          65.53     30.88          65.14
##           5     33      6.61          72.14      6.57          71.71
##           6     25      5.01          77.15      4.98          76.69
##           7      5      1.00          78.16      1.00          77.69
##           8      2      0.40          78.56      0.40          78.09
##           9    107     21.44         100.00     21.31          99.40
##        <NA>      3                               0.60         100.00
##       Total    502    100.00         100.00    100.00         100.00
summarytools::freq(ascils_dat$ethncity)
## Frequencies  
## ascils_dat$ethncity  
## Label: Ethnicity  
## Type: Numeric  
## 
##               Freq   % Valid   % Valid Cum.   % Total   % Total Cum.
## ----------- ------ --------- -------------- --------- --------------
##           0     19      3.78           3.78      3.78           3.78
##           1    170     33.86          37.65     33.86          37.65
##           2    199     39.64          77.29     39.64          77.29
##           3    114     22.71         100.00     22.71         100.00
##        <NA>      0                               0.00         100.00
##       Total    502    100.00         100.00    100.00         100.00
summarytools::freq(ascils_dat$type)
## Frequencies  
## ascils_dat$type  
## Label: Type - Scientist / Engineer  
## Type: Numeric  
## 
##               Freq   % Valid   % Valid Cum.   % Total   % Total Cum.
## ----------- ------ --------- -------------- --------- --------------
##           1    392     78.09          78.09     78.09          78.09
##           2    110     21.91         100.00     21.91         100.00
##        <NA>      0                               0.00         100.00
##       Total    502    100.00         100.00    100.00         100.00

RQ3: Compute correlations between dimensions of research experience and identity as a scientist.

Just simple bivariate correlations here, with tests of significance for difference in strength. Then test for differences in correlation strength by gender, race/ethnicity, and major. Identity items are as follows:

tibble(labelled::look_for((ascils_dat %>% dplyr::select(ident1,
                                                 ident3,
                                                 ident5,
                                                 ident7,
                                                 ident9,
                                                 ident10,)), details = FALSE))
## # A tibble: 6 × 3
##     pos variable label                        
##   <int> <chr>    <chr>                        
## 1     1 ident1   Sci / Eng Part of Self-Image 
## 2     2 ident3   Belong to Sci / Eng Community
## 3     3 ident5   Sci / Eng Reflection of Self 
## 4     4 ident7   Think of Self as Sci / Eng   
## 5     5 ident9   Belong in Field of Sci / Eng 
## 6     6 ident10  I am a Sci / Eng

END OF STUDY 1!!