This document outlines the initial analysis details of Study 1 of the STEM Research Experience Project, which is aimed at determining the nature and associations of different underlying components of STEM research experience.
Study 1 relies on archival data from the AScILS (Assessing Scientific Inquiry and Leadership Skills) project collected from alumni of undergraduate majors in STEM. We begin with this dataset because it is the largest and most variable. The purpose of this phase is as follows:
Study 2 will involve confirming the factor model in a sample of high school students and will include mentor ratings of identity.
Study 3 will involve confirming the factor model in a sample of current undergraduate students and will include both measures of identity and a scientific reasoning assessment.
The data for this project were previously published as Study 1 in:
Syed, M., Zurbriggen, E., Chemers, M. M., Goza, B. K., Bearman, S., Crosby, F., Shaw, J. M., Hunter, L., & Morgan, E. M. (2019). The role of self-efficacy and identity in mediating the effects of STEM support experiences. Analysis of Social Issues and Public Policy, 19(1), 7-49. https://doi.org/10.1111/asap.12170
Load the packages we will need:
library(haven)
library(labelled)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(summarytools)
The data file is “SciEngSurvey Alumni Retro MASTER.sav,” located in “~/Box/AScILS/Alumni Survey.” Set working directory to Box and run the following:
ascils_dat <- haven::read_sav("AScILS/Alumni Survey/SciEngSurvey Alumni Retro MASTER.sav")
dim(ascils_dat)
## [1] 502 374
These are the 19 variables assessing research experience, each rated on a 1-5 scale:
labelled::look_for((ascils_dat %>% dplyr::select(outcls1:outcls19)), details = FALSE)
## pos variable label
## 1 outcls1 Participated in Research / Eng Projects
## 2 outcls2 Worked in Sci / Eng
## 3 outcls3 Member of Research / Eng Team
## 4 outcls4 Played Leadership Role
## 5 outcls5 Generated Research Question / Eng Problem
## 6 outcls6 Identified Questions
## 7 outcls7 Collected Data / Identified Constraints
## 8 outcls8 Interpreted Data / Found Solutions
## 9 outcls9 Explained Results / Evaluated Solution Fit
## 10 outcls10 Used Literature
## 11 outcls11 Related Results to Work of Others
## 12 outcls12 Gave Presentation to Students
## 13 outcls13 Gave Professional Presentation
## 14 outcls14 Wrote Article
## 15 outcls15 Planned Research / Projects
## 16 outcls16 Attended Lectures
## 17 outcls17 Attended Conferences
## 18 outcls18 Learned Technical Skills
## 19 outcls19 Learned Terminology
In Step 1, randomly select 60% of cases (302) and conduct exploratory factor analysis, determining the optimal factor structure.
In Step 2, confirm the finalized structure with CFA on remaining 200 cases. Any multi-dimensional solution should be compared against a one-factor model and any other plausible models.
Starting with the finalized factor structure, assesss measurement invariance by gender (men/women), race/ethnicity (URM/Asian/White), and major (science/engineering).
summarytools::freq(ascils_dat$gennumpp)
## Frequencies
## ascils_dat$gennumpp
## Type: Numeric
##
## Freq % Valid % Valid Cum. % Total % Total Cum.
## ----------- ------ --------- -------------- --------- --------------
## 1 249 57.51 57.51 49.60 49.60
## 2 184 42.49 100.00 36.65 86.25
## <NA> 69 13.75 100.00
## Total 502 100.00 100.00 100.00 100.00
summarytools::freq(ascils_dat$ethnumpp)
## Frequencies
## ascils_dat$ethnumpp
## Type: Numeric
##
## Freq % Valid % Valid Cum. % Total % Total Cum.
## ----------- ------ --------- -------------- --------- --------------
## 1 31 6.21 6.21 6.18 6.18
## 2 71 14.23 20.44 14.14 20.32
## 3 70 14.03 34.47 13.94 34.26
## 4 155 31.06 65.53 30.88 65.14
## 5 33 6.61 72.14 6.57 71.71
## 6 25 5.01 77.15 4.98 76.69
## 7 5 1.00 78.16 1.00 77.69
## 8 2 0.40 78.56 0.40 78.09
## 9 107 21.44 100.00 21.31 99.40
## <NA> 3 0.60 100.00
## Total 502 100.00 100.00 100.00 100.00
summarytools::freq(ascils_dat$ethncity)
## Frequencies
## ascils_dat$ethncity
## Label: Ethnicity
## Type: Numeric
##
## Freq % Valid % Valid Cum. % Total % Total Cum.
## ----------- ------ --------- -------------- --------- --------------
## 0 19 3.78 3.78 3.78 3.78
## 1 170 33.86 37.65 33.86 37.65
## 2 199 39.64 77.29 39.64 77.29
## 3 114 22.71 100.00 22.71 100.00
## <NA> 0 0.00 100.00
## Total 502 100.00 100.00 100.00 100.00
summarytools::freq(ascils_dat$type)
## Frequencies
## ascils_dat$type
## Label: Type - Scientist / Engineer
## Type: Numeric
##
## Freq % Valid % Valid Cum. % Total % Total Cum.
## ----------- ------ --------- -------------- --------- --------------
## 1 392 78.09 78.09 78.09 78.09
## 2 110 21.91 100.00 21.91 100.00
## <NA> 0 0.00 100.00
## Total 502 100.00 100.00 100.00 100.00
Just simple bivariate correlations here, with tests of significance for difference in strength. Then test for differences in correlation strength by gender, race/ethnicity, and major. Identity items are as follows:
tibble(labelled::look_for((ascils_dat %>% dplyr::select(ident1,
ident3,
ident5,
ident7,
ident9,
ident10,)), details = FALSE))
## # A tibble: 6 × 3
## pos variable label
## <int> <chr> <chr>
## 1 1 ident1 Sci / Eng Part of Self-Image
## 2 2 ident3 Belong to Sci / Eng Community
## 3 3 ident5 Sci / Eng Reflection of Self
## 4 4 ident7 Think of Self as Sci / Eng
## 5 5 ident9 Belong in Field of Sci / Eng
## 6 6 ident10 I am a Sci / Eng