The final activity for each learning lab provides space to work with data and to reflect on how the concepts and techniques introduced in each lab might apply to your own research.

To earn a badge for each lab, you are required to respond to a set of prompts for two parts:

Part I: Reflect and Plan

Use the institutional library (e.g. NCSU Library), Google Scholar or search engine to locate a research article, presentation, or resource that applies learning analytics analysis to an educational context or topic of interest. More specifically, locate a study that makes use of one of the data structures we learned today. You are also welcome to select one of your research papers.

  1. Provide an APA citation for your selected study.

    Xing, W., Zhu, G., Arslan, O., Shim, O., & Popov, V. (2021). Using learning analytics to explore the multifaceted engagement in collaborative learning. Journal of Computing in Higher Education.

  2. What educational issue, “problem of practice,” and/or questions were addressed?

    Measuring multifaceted engagement (behavioral, social, cognitive, emotional, meta-cognitive) when students solved the problem together.

  3. What are some common approaches EDA approaches used and what did they entail?

    • The researchers collected learning analytics from chat messages and behavioral log data by using KNN, DT, RF, and SVM approaches.
  4. How were data visualization or feature engineering used to support analysis, if at all?What were the key findings or conclusions?

    • tables and diagrams to explain the results and the model. Behavioral and cognitive engagement have positive effect on problem solving while social engagement has negative effect. Group solving problem has positive effect on individual cognitive understanding.
  5. Finally, what value, if any, might education practitioners find in these results?

    • To understand and predict how collaborative learning can be structured/organized based on different types of engagement.

Draft a new research question of guided by the the phases of the Learning Anlytics Workflow. Or use one of your current research questions.

  1. What educational issue, “problem of practice,” and/or questions is addressed??

    • How collaborative learning can be effectively designed in advance? What factors are needed to be considered?
  2. Briefly describe any steps of the EDA approach that will be used..

    • Log data can be collected to understand group dynamics - who and when logged online to work on group problem; based on log data, the types of groups could be predicted - whether the group is effective or not.
  3. What elements of EDA might require human judgement and decision making?

    • Content analysis for problem solving may require human interpretation to understand cognitive engagement during group work.

Part II: Data Product

In our Learning Analytics code-along, we only scratched the surface on the number of ways that we can wrangle the data.

Using one of the data sets provided in the data folder, your goal for this lab is to extend the Data Visualizations using ggplot for Learning Analytics. You have three options for completing the Data Product portion: You can create the visualization exercise provided. Create a visualization of your choice using a data set from the data folder Create a visualization using your own data.

I highly recommend creating a new R script in your lab-3 folder to complete this task. When your code is ready to share, use the code chunk below to share the final code for your model and answer the questions that follow.

Exercise 1: - Using the `sci-online to create a basic visualization that: + Examine the relationship between two categorical variables. + Add an appropriate title to your chart. + Add a caption that poses a question educators may have about this data that your visualization could help answer.

# YOUR FINAL CODE HERE
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0      ✔ purrr   0.3.4 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.1 
## ✔ readr   2.1.2      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(readxl)
library(readr)
 
sci_online_classes <- read_csv("data/sci-online-classes.csv")
## Rows: 603 Columns: 30
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (6): course_id, subject, semester, section, Gradebook_Item, Gender
## dbl (23): student_id, total_points_possible, total_points_earned, percentage...
## lgl  (1): Grade_Category
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
view(sci_online_classes)
library(here)
## here() starts at /Users/lolesova/Dropbox (UFL)/LASER2022/Foundation Labs/foundational-skills/foundation_lab_3
library(skimr)
skim(sci_online_classes)
Data summary
Name sci_online_classes
Number of rows 603
Number of columns 30
_______________________
Column type frequency:
character 6
logical 1
numeric 23
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
course_id 0 1 12 13 0 26 0
subject 0 1 4 5 0 5 0
semester 0 1 4 4 0 3 0
section 0 1 2 2 0 4 0
Gradebook_Item 0 1 9 35 0 3 0
Gender 0 1 1 1 0 2 0

Variable type: logical

skim_variable n_missing complete_rate mean count
Grade_Category 603 0 NaN :

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
student_id 0 1.00 86069.54 10548.60 43146.00 85612.50 88340.00 92730.50 97441.00 ▁▁▁▃▇
total_points_possible 0 1.00 4274.41 2312.74 840.00 2809.50 3583.00 5069.00 15552.00 ▇▅▂▁▁
total_points_earned 0 1.00 3244.69 1832.00 651.00 2050.50 2757.00 3875.00 12208.00 ▇▅▁▁▁
percentage_earned 0 1.00 0.76 0.09 0.34 0.70 0.78 0.83 0.91 ▁▁▃▇▇
FinalGradeCEMS 30 0.95 77.20 22.23 0.00 71.25 84.57 92.10 100.00 ▁▁▁▃▇
Points_Possible 0 1.00 76.87 167.51 5.00 10.00 10.00 30.00 935.00 ▇▁▁▁▁
Points_Earned 92 0.85 68.63 145.26 0.00 7.00 10.00 26.12 828.20 ▇▁▁▁▁
q1 123 0.80 4.30 0.68 1.00 4.00 4.00 5.00 5.00 ▁▁▂▇▇
q2 126 0.79 3.63 0.93 1.00 3.00 4.00 4.00 5.00 ▁▂▆▇▃
q3 123 0.80 3.33 0.91 1.00 3.00 3.00 4.00 5.00 ▁▃▇▅▂
q4 125 0.79 4.27 0.85 1.00 4.00 4.00 5.00 5.00 ▁▁▂▇▇
q5 127 0.79 4.19 0.68 2.00 4.00 4.00 5.00 5.00 ▁▂▁▇▅
q6 127 0.79 4.01 0.80 1.00 4.00 4.00 5.00 5.00 ▁▁▃▇▅
q7 129 0.79 3.91 0.82 1.00 3.00 4.00 4.75 5.00 ▁▁▅▇▅
q8 129 0.79 4.29 0.68 1.00 4.00 4.00 5.00 5.00 ▁▁▂▇▆
q9 129 0.79 3.49 0.98 1.00 3.00 4.00 4.00 5.00 ▁▃▇▇▃
q10 129 0.79 4.10 0.93 1.00 4.00 4.00 5.00 5.00 ▁▂▃▇▇
TimeSpent 5 0.99 1799.75 1354.93 0.45 851.90 1550.91 2426.09 8870.88 ▇▅▁▁▁
TimeSpent_hours 5 0.99 30.00 22.58 0.01 14.20 25.85 40.43 147.85 ▇▅▁▁▁
TimeSpent_std 5 0.99 0.00 1.00 -1.33 -0.70 -0.18 0.46 5.22 ▇▅▁▁▁
int 76 0.87 4.22 0.59 2.00 3.90 4.20 4.70 5.00 ▁▁▃▇▇
pc 75 0.88 3.61 0.64 1.50 3.00 3.50 4.00 5.00 ▁▁▇▅▂
uv 75 0.88 3.72 0.70 1.00 3.33 3.67 4.17 5.00 ▁▁▆▇▅
ggplot(sci_online_classes,
       aes(x = TimeSpent_hours, 
           y = percentage_earned,
           color = FinalGradeCEMS)) +
    geom_point()+
labs(title="How Time Spent on Course LMS is Related to Percentage Earned in the Course",
     x="Time Spent (Hours)",
     y = "Percentage of Points Earned")
## Warning: Removed 5 rows containing missing values (`geom_point()`).

Exercise 2: - Using the `sci-online to create a basic visualization that: + examines the relationship between two continuous variables. (scatterplot with layers, #’ a log-log or line plot, or one using coord functions.) + Add an appropriate title to your chart. + Add a caption that poses a question educators may have about this data that your visualization could help answer. + Add or adjust any aesthetics to improve the readability of visual appeal of your viz. + Use a color scale if appropriate to modify the default colors used by ggplot. + Adjust or remove your legend as appropriate.

# YOUR FINAL CODE HERE
sci_online_classes %>% 
  ggplot(aes(x = TimeSpent_hours)) +
    geom_histogram(bins = 10,
                 fill = "red",
                 colour = "black")+
labs(title="Time Spent on LMS histogram plot",x="Time Spent(hours)", y = "Count")+
  theme_classic()
## Warning: Removed 5 rows containing non-finite values (`stat_bin()`).

Knit & Submit

Congratulations, you’ve completed your Foundation Badge on Learning Analytics Workflow! Complete the following steps to submit your work for review by:

  1. Change the name of the author: in the YAML header at the very top of this document to your name. As noted in Reproducible Research in R, The YAML header controls the style and feel for knitted document but doesn’t actually display in the final output.

  2. Click the yarn icon above to “knit” your data product to a HTML file that will be saved in your R Project folder.

  3. Commit your changes in GitHub Desktop and push them to your online GitHub repository.

  4. Publish your HTML page the web using one of the following publishing methods: Publish on RPubs by clicking the “Publish” button located in the Viewer Pane when you knit your document. Note, you will need to quickly create a RPubs account. Publishing on GitHub using either GitHub Pages or the HTML previewer.

  5. Post a new discussion on GitHub to our Foundations Badges forum. In your post, include a link to your published web page and write a short reflection highlighting one thing you learned from this lab and one thing you’d like to explore further.