Initial Interpretations of a Predictive Model
Badge
As a reminder, to earn a badge for each lab, you are required to respond to a set of prompts for two parts:
In Part I, you will reflect on your understanding of key concepts and begin to think about potential next steps for your own study.
In Part II, you will take very precursory steps toward interpreting a supervised machine learning model.
Part I: Reflect and Plan
Part A:
What are one or more key differences between regression (inferential) and supervised machine learning models?
- Traditional regression models are designed to test specific hypotheses about relationships between variables, often guided by theory. For example, for my dissertation and manuscripts are inferential.
- The studies utilized models which give coefficients, p-values and effect sizes to help explain why and how predictors relate to outcomes.
- For example, I used logistic regression to estimate how predictors such as self-identification as pre-med, cross-racial interactions, relate to healthcare career aspirations for Black and Latinx students.
Describe how a supervised machine learning approach could be useful given your research interests/a select research question.
SML are predictive. SML focuses less on explaining and more accurately predicting outcomes for new or unseen cases.
In which, for my previous studies I would like to explore the entire and original dataset since during the screening process many variables were removed.
Key differences:
Regression is about inference and explanation (theory-driven). SML is about prediction and pattern recognition (data-driven).
- Describe how a regression modeling approach (or an extension of a regression approach, such as an SEM or multi-level model) could be useful given your research interests/a select research question.
I would like expand on RQ such as, what college and community factors best predict first-gen, rural, SES, URM students to persist in STEM majors and apply to professional schools such medical schools?
I am thinking about SML model to large dataset in which I have the HERI UCLA TFS and CSS 2015-2019 surveys. I would like embed IPEDS and possible request more variables on campus climate. Originally, I had many predictors but reduced since I removed missing values. I would like to understand the complex patterns and identify combinations of factors that strongly predict outcomes. I would like to have a “identifying score” for students which could guide targeted early interventions.
I am considering complementing ML results with traditional regression or SEM for theory-driven interpretation, and making the “black box” more transparent. Since ML models optimize prediction but do not easily show why the prediction happens. Therefore, I want to show how systemic racism, institutional climate, or interectional barriers shape outcomes - not just predict.
I need to learn more of how to utilize models for transparency. My goal is to slightly lower prediction accuracy than complex models instead fit better towards my commitment to transparent equity analysis. Considering utilizing decision trees to explain how different characteristics combine or other models such as rule-based or monotonic models, in which I can constrain them to align with theory (e.g., negative cross-racial interaction should never increase success probability).
Part B: Use institutional library access (e.g. NCSU Library), Google Scholar or search engine to locate a research article, presentation, or resource that applies supervised machine learning to an educational context aligned with your research interests.
Provide an APA citation for your selected study.
No luck in finding article within my area of research.
What research questions were the authors of this study trying to address using Latent Profile Analysis or a similar method?
What were the results of these analyses?
Part II: Interpret our Supervised Machine Learning Model
Here, we are going to interpret our supervised machine learning model in a very precursory way. Later, you will have the opportunity to dive deep into metrics for interpreting supervised machine learning models. What does this model seem to tell us? How useful is this predictive model? How could it be useful to educational stakeholders?
My interpretation of supervised machine learning is it takes a large dataset with known outcomes and “learns” the patterns or combinations of factors that best predict those outcomes. Once trained, it can estimate outcomes for new data. It tells you the patterns in your data tend to go together - you can estimate what might happen for new cases.
What does SML tell me as a researcher? Supervised ML tell me which factors matter most for predicting an outcome - sometimes revealing unexpected combinations. How multiple factors interact in complex, non-linear ways. In addition, where theory-based assumptions may hold true, and where hidden relationships might exists.
How useful is this predictive model in my context? I want to move beyond beyond describing general trends but start forecasting opportunities for individuals or schools. I have big and complex datasets where regressions are cumbersome or miss subtle interactions. I need to identify where to focus limited resources, such as, which students or campuses would benefit most from STEM career counseling. Before diving into SML casual I feel I need clear causal inferences. MS can show patterns but may not theory test as cleanly as other inferential statistics such as SEM. I would like to explore how to utilize this fellowship to write grant proposals.
Knit and Publish
Complete the following steps to knit and publish your work:
First, change the name of the
author:
in the YAML header at the very top of this document to your name. The YAML header controls the style and feel for knitted document but doesn’t actually display in the final output.Next, click the knit button in the toolbar above to “knit” your R Markdown document to a HTML file that will be saved in your R Project folder. You should see a formatted webpage appear in your Viewer tab in the lower right pan or in a new browser window. Let’s us know if you run into any issues with knitting.
Finally, publish your webpage on Posit Cloud by clicking the “Publish” button located in the Viewer Pane after you knit your document. See screenshot below.
Receive Your Badge
To receive credit for this assignment and earn your first ML badge, share the link to published webpage under the next incomplete badge artifact column on the LASER Scholar Information and Documents spreadsheet: https://go.ncsu.edu/laser-sheet.
Once your instructor has checked your link, you will be provided a physical version of the badge below!