The lab provides space to work with data and to reflect on how the concepts and techniques introduced in each lab might apply to your own research.
To earn a badge for each lab, you are required to respond to a set of prompts for two parts:
In Part I, you will extend our model by adding another variable.
In Part II, you will reflect on your understanding of key concepts and begin to think about potential next steps for your own study.
In this part of the badge activity, please add another variable – a variable for the number of days before the start of the module students registered. This variable will be a third predictor. By adding it, you’ll be able to examine how much more accurate your model is (if at al, as this variable might not have great predictive power). Note that this variable is a number and so no pre-processing is necessary.
In doing so, please move all of your code needed to run the analysis over from your case study file here. This is essential for your analysis to be reproducible. You may wish to break your code into multiple chunks based on the overall purpose of the code in the chunk (e.g., loading packages and data, wrangling data, and each of the machine learning steps).
## This is the code from Lab 1 Case Study
knitr::opts_chunk$set(echo = TRUE, eval = TRUE)
library(tidyverse)
library(janitor)
library(tidymodels)
# Read CVS
students <- read_csv("data/oulad-students.csv")
## Inspect Data
glimpse(students)
# Mutate Variables
students <- students %>%
mutate(pass = ifelse(final_result == "Pass", 1, 0)) %>% # creates a new variable named "pass" and a dummy code of 1 if value of final_result equals "pass" and 0 if not
mutate(pass = as.factor(pass)) # makes the variable a factor, helping later steps
## Creating New Independant(predictor) Variable
students <- students %>%
mutate(disability = as.factor(disability))
## View Data so far
View(students)
## Creating New Independant(predictor) Variable
students <- students %>%
mutate(disability = ifelse(disability == "Y", 1, 0)) %>% #
mutate(disability = as.factor(disability)) # makes the variable a factor, helping later steps
## Examine Variables
students %>%
count(id_student) # this many students
students %>%
count(code_module, code_presentation) # this many offerings
## Feature Engineering
students <- students %>%
mutate(imd_band = factor(imd_band, levels = c("0-10%",
"10-20%",
"20-30%",
"30-40%",
"40-50%",
"50-60%",
"60-70%",
"70-80%",
"80-90%",
"90-100%"))) %>% # this creates a factor with ordered levels
mutate(imd_band = as.integer(imd_band)) # this changes the levels into integers based on the order of the factor levels
students
## Split Data
set.seed(20230712)
train_test_split <- initial_split(students, prop = .80)
data_train <- training(train_test_split)
data_test <- testing(train_test_split)
## Check Data split
data_train
data_test
## Create a Recipie
my_rec <- recipe(pass ~ disability + imd_band, data = data_train)
my_rec
## Specify Model
# specify model
my_mod <-
logistic_reg()
## Finish Specifing Model
my_mod <-
logistic_reg() %>%
set_engine("glm") %>% # generalized linear model
set_mode("classification") # since we are predicting a dichotomous outcome, specify classification; for a number, specify regression
my_mod
## Add Workflow
my_wf <-
workflow() %>% # create a workflow
add_model(my_mod) %>% # add the model we wrote above
add_recipe(my_rec) # add our recipe we wrote above
## Fit Model
fitted_model <- fit(my_wf, data = data_train)
## Check out Fitted Model
fitted_model
## Last Fit Function
##last_fit(my_wf,train_test_split)
## Final Fit - Here in the case study that I published I used my_wf instead if fitted_model because I thought I was getting an error. After doing the Lab 1 Overview I think I should have used fitted_model,
final_fit <- last_fit(fitted_model, train_test_split)
## Interpret Accuracy
# collect test split predictions
final_fit %>%
collect_predictions()
## Summarize
final_fit %>%
collect_predictions() %>% # see test set predictions
select(.pred_class, pass) %>% # just to make the output easier to view
mutate(correct = .pred_class == pass) # create a new variable, correct, telling us when the model was and was not correct
## Counting Values of Correct 62.7%
final_fit %>%
collect_predictions() %>% # see test set predictions
select(.pred_class, pass) %>% # just to make the output easier to view
mutate(correct = .pred_class == pass) %>% # create a new variable, correct, telling us when the model was and was not correct
tabyl(correct)
## How Accurate was the model?
students %>%
count(pass)
students %>%
mutate(prediction = sample(c(0, 1), nrow(students), replace = TRUE)) %>%
mutate(correct = if_else(prediction == 1 & pass == 1 |
prediction == 0 & pass == 0, 1, 0)) %>%
tabyl(correct)
## This is the code with the third predictor variable added in
knitr::opts_chunk$set(echo = TRUE, eval = TRUE)
library(tidyverse)
library(janitor)
library(tidymodels)
# Read CVS
students <- read_csv("data/oulad-students.csv")
## Inspect Data
glimpse(students)
# Mutate Variables
students <- students %>%
mutate(pass = ifelse(final_result == "Pass", 1, 0)) %>% # creates a new variable named "pass" and a dummy code of 1 if value of final_result equals "pass" and 0 if not
mutate(pass = as.factor(pass)) # makes the variable a factor, helping later steps
## Creating New Independant(predictor) Variable
students <- students %>%
mutate(disability = as.factor(disability))
## View Data so far
View(students)
## Creating New Independent(predictor) Variable
students <- students %>%
mutate(disability = ifelse(disability == "Y", 1, 0)) %>% #
mutate(disability = as.factor(disability)) # makes the variable a factor, helping later steps
## Creating our new third independent(predictor Variable) - its already a number so no need for this
##students <- students %>%
## mutate(date_registration = as.factor(date_registration)) # makes the variable a factor, helping later steps
students
## Examine Variables
students %>%
count(id_student) # this many students
students %>%
count(code_module, code_presentation) # this many offerings
## Feature Engineering
students <- students %>%
mutate(imd_band = factor(imd_band, levels = c("0-10%",
"10-20%",
"20-30%",
"30-40%",
"40-50%",
"50-60%",
"60-70%",
"70-80%",
"80-90%",
"90-100%"))) %>% # this creates a factor with ordered levels
mutate(imd_band = as.integer(imd_band)) # this changes the levels into integers based on the order of the factor levels
students
## Split Data
set.seed(20230712)
train_test_split <- initial_split(students, prop = .80)
data_train <- training(train_test_split)
data_test <- testing(train_test_split)
## Check Data split
data_train
data_test
## Create a Recipie
my_rec <- recipe(pass ~ disability + imd_band + date_registration, data = data_train)
my_rec
## Specify Model
# specify model
my_mod <-
logistic_reg()
## Finish Specifing Model
my_mod <-
logistic_reg() %>%
set_engine("glm") %>% # generalized linear model
set_mode("classification") # since we are predicting a dichotomous outcome, specify classification; for a number, specify regression
my_mod
## Add Workflow
my_wf <-
workflow() %>% # create a workflow
add_model(my_mod) %>% # add the model we wrote above
add_recipe(my_rec) # add our recipe we wrote above
## Fit Model
fitted_model <- fit(my_wf, data = data_train)
## Check out Fitted Model
##fitted_model
## Last Fit Function
##last_fit(my_wf,train_test_split)
## Final Fit - Here in the case study that I published I used my_wf instead if fitted_model because I thought I was getting an error. After doing the Lab 1 Overview I think I should have used fitted_model,
final_fit <- last_fit(fitted_model, train_test_split)
final_fit
## Interpret Accuracy
# collect test split predictions
final_fit %>%
collect_predictions()
## Summarize
final_fit %>%
collect_predictions() %>% # see test set predictions
select(.pred_class, pass) %>% # just to make the output easier to view
mutate(correct = .pred_class == pass) # create a new variable, correct, telling us when the model was and was not correct
## Counting Values of Correct 62.7%
final_fit %>%
collect_predictions() %>% # see test set predictions
select(.pred_class, pass) %>% # just to make the output easier to view
mutate(correct = .pred_class == pass) %>% # create a new variable, correct, telling us when the model was and was not correct
tabyl(correct)
## How Accurate was the model?
students %>%
count(pass)
students %>%
mutate(prediction = sample(c(0, 1), nrow(students), replace = TRUE)) %>%
mutate(correct = if_else(prediction == 1 & pass == 1 |
prediction == 0 & pass == 0, 1, 0)) %>%
tabyl(correct)
Previous results: 62.7% accurate results
New results: 62.7%
How does the accuracy of this new model compare? Add a few reflections below:
The results are the exact same with the addition of the third predictor variable. Indicating the third variable didnt have any predicting power.
Part A: Please refer back to Breiman’s (2001) article for these three questions.
Part B:
Part C: Use the institutional library (e.g. NU Library), Google Scholar or search engine to locate a research article, presentation, or resource that applies machine learning to an educational context aligned with your research interests. More specifically, locate a machine learning study that involves making predictions.
Provide an APA citation for your selected study.
What research questions were the authors of this study trying to address and why did they consider these questions important?
What were the results of these analyses?
Complete the following steps to knit and publish your work:
First, change the name of the author: in the YAML
header at the very top of this document to your name. The YAML
header controls the style and feel for knitted document but doesn’t
actually display in the final output.
Next, click the knit button in the toolbar above to “knit” your R Markdown document to a HTML file that will be saved in your R Project folder. You should see a formatted webpage appear in your Viewer tab in the lower right pan or in a new browser window. Let your instructor know if you run into any issues with knitting.
Finally, publish your webpage on Rpubs by clicking the “Publish” button located in the Viewer Pane after you knit your document.
Congratulations, you’ve completed your first badge activity! To receive credit for this assignment and earn your first official Lab Badge, submit the link on Blackboard and share with your instructor.
Once your instructor has checked your link, you will be provided a physical version of the badge below!