The final activity for each learning lab provides space to work with data and to reflect on how the concepts and techniques introduced in each lab might apply to your own research.
To earn a badge for each lab, you are required to respond to a set of prompts for two parts:
In Part I, you will reflect on your understanding of key concepts and begin to think about potential next steps for your own study.
In Part II, you will create a simple data product in R that demonstrates your ability to apply an analytic technique introduced in this learning lab.
Part A:
Part B: Use the institutional library (e.g. NCSU Library), Google Scholar or search engine to locate a research article, presentation, or resource that applies machine learning to an educational context aligned with your research interests. More specifically, locate a machine learning study that involve making predictions.
Provide an APA citation for your selected study.
What research questions were the authors of this study trying to address and why did they consider these questions important?
What were the results of these analyses?
For the data product, you are asked to dive into what it means for the model to be predictively accurate. Specifically, we’ll explore some measures of just how predictively accurate the model we developed in the guided practice is.
We’ll use a shortcut to cut to the chase – interpreting the model.
The code below loads the model we estimated in the guided practice – in
the form of the final_fit. This is necessary even if you
currently have final_fit loaded in your environment/current
R session, as you’ll need to have everything generated by code in this
document for it to successfully “knit”.
library(here)
## here() starts at /home/alishinski/Documents/work/laser_institute/machine-learning
library(readr)
library(tidymodels)
## ── Attaching packages ────────────────────────────────────── tidymodels 0.2.0 ──
## ✔ broom 0.8.0 ✔ recipes 1.0.1
## ✔ dials 1.0.0 ✔ rsample 1.0.0
## ✔ dplyr 1.0.9 ✔ tibble 3.1.7
## ✔ ggplot2 3.3.6 ✔ tidyr 1.2.0
## ✔ infer 1.0.2 ✔ tune 1.0.0
## ✔ modeldata 1.0.0 ✔ workflows 1.0.0
## ✔ parsnip 1.0.0 ✔ workflowsets 0.2.1
## ✔ purrr 0.3.4 ✔ yardstick 1.0.0
## ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
## ✖ purrr::discard() masks scales::discard()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ✖ yardstick::spec() masks readr::spec()
## ✖ recipes::step() masks stats::step()
## • Learn how to get started at https://www.tidymodels.org/start/
final_fit <- read_rds("out/ngsschat-final-fit.rds")
final_fit
## # Resampling results
## # Manual resampling
## # A tibble: 1 × 6
## splits id .metrics .notes .predictions .workflow
## <list> <chr> <list> <list> <list> <list>
## 1 <split [3034/759]> train/test split <tibble> <tibble> <tibble> <workflow>
##
## There were issues with some computations:
##
## - Warning(s) x1: glm.fit: fitted probabilities numerically 0 or 1 occurred
##
## Run `show_notes(.Last.tune.result)` for more information.
Run the code below to calculate a *confusion matrix*
cm <- final_fit$.predictions[[1]] %>%
conf_mat(.pred_class, code)
Please interpret the above confusion matrix using these guidelines in terms of the true positive, true negative, false positive, and false negative rates. After each of the following (i.e., “True positive”), add both the number and percentage of observations. For instance, if there were 100 true positives out of a total of 400 data points, please write: 100 (25%).
Accuracy: 0.8722003
True positive: 0.8125
True negative: 0.9157175
False positive: 0.1875
False negative: 0.0842825
You can read more about interpreting these here in terms of the specificity, sensitivity, precision, and recall, four statistics based on the information in the confusion matrix.
Return to your answer for Part 1A. Now, having examined the true and false positive and negative rates, how good do you think machine learning model we developed in the case study was? Write more specifically using the evidence you have from creating and interpreting the confusion matrix (above) after the following bullet point.
Congratulations, you’ve completed your Prediction badge! Complete the following steps to submit your work for review:
Change the name of the author: in the YAML
header at the very top of this document to your name. As noted in Reproducible
Research in R, The YAML header controls the style and feel for
knitted document but doesn’t actually display in the final
output.
Click the yarn icon above to “knit” your data product to a HTML file that will be saved in your R Project folder.
Commit your changes in GitHub Desktop and push them to your online GitHub repository.
Publish your HTML page the web using one of the following publishing methods:
Publish on RPubs by clicking the “Publish” button located in the Viewer Pane when you knit your document. Note, you will need to quickly create a RPubs account.
Publishing on GitHub using either GitHub Pages or the HTML previewer.
Post a new discussion on GitHub to our ML Badges forum. In your post, include a link to your published web page and a short reflection highlighting one thing you learned from this lab and one thing you’d like to explore further.