Machine Learning - Learning Lab 4 Badge

As a reminder, to earn a badge for each lab, you are required to respond to a set of prompts for two parts:

In Part I, you will reflect on your understanding of key concepts and begin to think about potential next steps for your own study.
In Part II, you will create a simple data product in R that demonstrates your ability to apply an analytic technique introduced in this learning lab.

Part I: Reflect and Plan

Part A:

How interpretable or useful is the solution we estimated in the case study?

It was useful to explore the data and see different categories of the data. It could be used for further in-depth analysis.

How might a qualitative analysis follow from the Latent Profile Analysis we carried out? What steps could you take in a qualitative investigation? Feel free to think broadly and creatively here!

I could used the different categorise discovered by the LPA to do further qualitative analysis. Then I could validate the codes.

Part B: Once again, use the institutional library (e.g. NCSU Library), Google Scholar or search engine to locate a research article, presentation, or resource that applies unsupervised machine learning to an educational context aligned with your research interests. More specifically, locate a machine learning study that involve using Latent Profile Analysis or a similar method. You may find the published papers that have used LPA helpful in this respect; those can be browsed here.

Provide an APA citation for your selected study.
- Damşa, C., Langford, M., Uehara, D., & Scherer, R. (2021). Teachers’ agency and online education in times of crisis. Computers in Human Behavior, 121, 106793.
What research questions were the authors of this study trying to address and why did they consider these questions important?
- What are the the nature and degree of teachers’ agency?
What were the results of these analyses?
- Higher education should systematically cultivate digital competence, to promote, build, and sustain transformative agency of teachers.

Part II: Data Product

Like the last data product, this one may be a challenge, too. Here, estimate latent profiles using your own data. If you do not have ready access to appropriate data (for LPA, continuous/numeric data), choose any of the data sets in the data folder of this repository.

Some code is provided below as a starting point.

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✔ ggplot2 3.3.5     ✔ purrr   0.3.4
## ✔ tibble  3.1.7     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.0
## ✔ readr   2.1.2     ✔ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

library(tidyLPA)

## You can use the function citation('tidyLPA') to create a citation for the use of {tidyLPA}.
## Mplus is not installed. Use only package = 'mclust' when calling estimate_profiles().

d <- read_csv("data/dat_csv_combine_final_full.csv")

## Rows: 514 Columns: 71

## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): SY ASSISTments Usage, problemType
## dbl (69): ITEST_id, AveKnow, AveCarelessness, AveCorrect.x, NumActions, AveR...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

d <- d %>% 
    select(AveCarelessness, AveKnow, AveCorrect = AveCorrect.x, AveResBored, 
           AveResEngcon, AveResConf, AveResFrust, AveResOfftask, 
           AveResGaming, NumActions) %>%
    janitor::clean_names()

scale_data <- function(x) {
    x = x - mean(x, na.rm = TRUE)
    x = x / sd(x, na.rm = TRUE)
    x
}

d <- d %>%
    mutate_all(funs(scaled = scale_data)) %>% # using our function to scale all of the variables
    select(contains("_scaled")) # selecting only the scaled version

## Warning: `funs()` was deprecated in dplyr 0.8.0.
## Please use a list of either functions or lambdas: 
## 
##   # Simple named list: 
##   list(mean = mean, median = median)
## 
##   # Auto named with `tibble::lst()`: 
##   tibble::lst(mean, median)
## 
##   # Using lambdas
##   list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.

d %>%
    estimate_profiles(1:10) %>% 
    compare_solutions()

## Warning: The solution with the maximum number of classes under consideration was
## considered to be the best solution according to one or more fit indices. Examine
## your results with care and consider estimating more classes.

## Compare tidyLPA solutions:
## 
##  Model Classes BIC      
##  1     1       14701.523
##  1     2       13569.547
##  1     3       12410.738
##  1     4       11845.334
##  1     5       11509.506
##  1     6       11274.194
##  1     7       11129.023
##  1     8       10932.234
##  1     9       10805.967
##  1     10      10688.745
## 
## Best model according to BIC is Model 1 with 10 classes.
## 
## An analytic hierarchy process, based on the fit indices AIC, AWE, BIC, CLC, and KIC (Akogul & Erisoglu, 2017), suggests the best solution is Model 1 with 10 classes.

our_solution <- d %>%
    estimate_profiles(4)

plot_profiles(our_solution, add_line = TRUE) +
    theme(axis.text.x = element_text(angle = 45, vjust = 0.5, hjust=1))

## Warning: Using shapes for an ordinal variable is not advised

## Warning: It is deprecated to specify `guide = FALSE` to remove a guide. Please
## use `guide = "none"` instead.

Please interpret the results of your analysis below. What did you find? How interpretable and useful are the profiles? And, what next steps - including those involving qualitative analysis - might you take to deepen this analysis?

We found that the students in group 1 are gaming the system, but are active in actions in class. Group 2 students are more frustrated. Group 3 students are bored and off-task. Group 4 students are knowledgeable and correct.

Knit & Submit

Congratulations, you’ve completed your Prediction badge! Complete the following steps to submit your work for review:

Change the name of the author: in the YAML header at the very top of this document to your name. As noted in Reproducible Research in R, The YAML header controls the style and feel for knitted document but doesn’t actually display in the final output.
Click the yarn icon above to “knit” your data product to a HTML file that will be saved in your R Project folder.
Commit your changes in GitHub Desktop and push them to your online GitHub repository.
Publish your HTML page the web using one of the following publishing methods:
- Publish on RPubs by clicking the “Publish” button located in the Viewer Pane when you knit your document. Note, you will need to quickly create a RPubs account.
- Publishing on GitHub using either GitHub Pages or the HTML previewer.
Post a new discussion on GitHub to our ML badges forum. In your post, include a link to your published web page and a short reflection highlighting one thing you learned from this lab and one thing you’d like to explore further.

Machine Learning - Learning Lab 4 Badge

Meina Zhu

July 15, 2022

Part I: Reflect and Plan

Part II: Data Product

Knit & Submit