The final activity for each learning lab provides space to work with data and to reflect on how the concepts and techniques introduced in each lab might apply to your own research.

To earn a badge for each lab, you are required to respond to a set of prompts for two parts:

Part I: Reflect and Plan

Use the institutional library (e.g. NCSU Library), Google Scholar or search engine to locate a research article, presentation, or resource that applies learning analytics analysis to an educational context or topic of interest. More specifically, locate a study that makes use of the Learning Analytics Workflow we learned today. You are also welcome to select one of your research papers.

  1. Provide an APA citation for your selected study.

    • Kim, D., Park, Y., Yoon, M., & Jo, I. H. (2016). Toward evidence-based learning analytics: Using proxy variables to improve asynchronous online discussion environments. The Internet and Higher Education, 30, 30-43.
  2. What educational issue, “problem of practice,” and/or questions were addressed? Early prediction of students’ engagement in asynchronous online discussions and low academic achievement. -

  3. Briefly describe any steps of the data-intensive research workflow that detailed in your article or presentation.

  4. Data processing:

    • Log data throughout the course duration: number of postings, mid-term test score, total time spent on the LMS (login duration, times spent on each learning task for actual learning time)
    • Extracting discussion board data, de-identifying student IDs, categorizing postings, counting number of letters, calculating time-related variables.
    • Missing data
  5. Classification technique: they used random forest technique to measure low and high achievers for prediction model.

  6. What were the key findings or conclusions? What value, if any, might education practitioners find in these results?

    • Evidence-based approach can help address a wide range of educational issues: authors suggested procedures for transforming conceptual constructs into quantizable, and quantified variables.

    • Possibility of using proxy-variables to improve instructional practices. Proxy variables can be used to predict online behavior and provide directions to support students.

    • Using proxy variables can enable effective formative assessment.

    • Integrating proxy variables can be used in large enrollment courses indicating low achievers, assessing specific behavior and providing automated recommendations.

  7. Finally, how, if at at, were educators in your self-selected article involved prior to wrangling and analysis? There is no direct information whether authors were involved in data mining prior to this study. I believe they did. -

Draft a new research question of guided by the the phases of the Learning Analytics Workflow. Or use one of your current research questions.

  1. What educational issue, “problem of practice,” and/or questions is addressed?? How does total time spent on learning tasks, i.e., reading posts and responding to others impact on the level of interaction among students? Whether students who spent more time in online discussions interact more than students who spent less time? -

  2. Briefly describe any steps of the data-intensive research workflow that can be detailed in your article or presentation.

  1. How, if at all, will your article touch upon the application(s) of LA to “understand and improve learning and the contexts in which learning occurs?”

Part II: Data Product

In our Learning Analytics code-along, we scratched the surface on the number of ways that we can wrangle the data.

Using one of the data sets provided in the data folder, your goal for this lab is to extend the Learning Analytics Workflow from our code-along by preparing and wrangling different data.

Or alternatively, you may use your own data set to use in the workflow. If you do decide to use your own data set you must include:

Feel free to create a new script in your lab 2 to work through the following problems. Then when satisfied add the code in the code chunks below. Don;t forget to run the code to make sure it works.

Instructions:

  1. Add your name to the document in author.

  2. Set up the first (or, two if using an Introduction) phases of the LA workflow below. I’ve added the wrangle section for you. You will need to Prepare the libraries necessary to wrangle the data.

Wrangle

  1. In the chunk called read-data: Import the sci-online-classes.csv from the data folder and save as a new object called sci_classes. Then inspect your data using a function of your choice.
# Type your code here
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0      ✔ purrr   0.3.4 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.1 
## ✔ readr   2.1.2      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
sci_classes <- read_csv("data/sci-online-classes.csv")
## Rows: 603 Columns: 30
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (6): course_id, subject, semester, section, Gradebook_Item, Gender
## dbl (23): student_id, total_points_possible, total_points_earned, percentage...
## lgl  (1): Grade_Category
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
view(sci_classes)
  1. In the select-1 code chunk: Use the ‘select’ function to select student_id, subject, semester, FinalGradeCEMS. Assign to a new object with a different name (you choose the name).
# Type your code here
library(here)
## here() starts at /Users/lolesova/Dropbox (UFL)/LASER2022/Foundation Labs/foundational-skills/foundation_lab_2
sci_classes <- read_csv(here("data", "sci-online-classes.csv"))
## Rows: 603 Columns: 30
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (6): course_id, subject, semester, section, Gradebook_Item, Gender
## dbl (23): student_id, total_points_possible, total_points_earned, percentage...
## lgl  (1): Grade_Category
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
sci_classes
## # A tibble: 603 × 30
##    student_id course_id  total…¹ total…² perce…³ subject semes…⁴ section Grade…⁵
##         <dbl> <chr>        <dbl>   <dbl>   <dbl> <chr>   <chr>   <chr>   <chr>  
##  1      43146 FrScA-S21…    3280    2220   0.677 FrScA   S216    02      POINTS…
##  2      44638 OcnA-S116…    3531    2672   0.757 OcnA    S116    01      ATTEMP…
##  3      47448 FrScA-S21…    2870    1897   0.661 FrScA   S216    01      POINTS…
##  4      47979 OcnA-S216…    4562    3090   0.677 OcnA    S216    01      POINTS…
##  5      48797 PhysA-S11…    2207    1910   0.865 PhysA   S116    01      POINTS…
##  6      51943 FrScA-S21…    4208    3596   0.855 FrScA   S216    03      POINTS…
##  7      52326 AnPhA-S21…    4325    2255   0.521 AnPhA   S216    01      POINTS…
##  8      52446 PhysA-S11…    2086    1719   0.824 PhysA   S116    01      POINTS…
##  9      53447 FrScA-S11…    4655    3149   0.676 FrScA   S116    01      POINTS…
## 10      53475 FrScA-S11…    1710    1402   0.820 FrScA   S116    02      POINTS…
## # … with 593 more rows, 21 more variables: Grade_Category <lgl>,
## #   FinalGradeCEMS <dbl>, Points_Possible <dbl>, Points_Earned <dbl>,
## #   Gender <chr>, q1 <dbl>, q2 <dbl>, q3 <dbl>, q4 <dbl>, q5 <dbl>, q6 <dbl>,
## #   q7 <dbl>, q8 <dbl>, q9 <dbl>, q10 <dbl>, TimeSpent <dbl>,
## #   TimeSpent_hours <dbl>, TimeSpent_std <dbl>, int <dbl>, pc <dbl>, uv <dbl>,
## #   and abbreviated variable names ¹​total_points_possible,
## #   ²​total_points_earned, ³​percentage_earned, ⁴​semester, ⁵​Gradebook_Item
sci_classes %>%  
  select(student_id,
           c("subject", "semester", "FinalGradeCEMS"))
## # A tibble: 603 × 4
##    student_id subject semester FinalGradeCEMS
##         <dbl> <chr>   <chr>             <dbl>
##  1      43146 FrScA   S216               93.5
##  2      44638 OcnA    S116               81.7
##  3      47448 FrScA   S216               88.5
##  4      47979 OcnA    S216               81.9
##  5      48797 PhysA   S116               84  
##  6      51943 FrScA   S216               NA  
##  7      52326 AnPhA   S216               83.6
##  8      52446 PhysA   S116               97.8
##  9      53447 FrScA   S116               96.1
## 10      53475 FrScA   S116               NA  
## # … with 593 more rows
sci_classes_2 <- sci_classes %>%
  select(student_id,
           c("student_id", "subject", "semester", "FinalGradeCEMS"))

sci_classes_2
## # A tibble: 603 × 4
##    student_id subject semester FinalGradeCEMS
##         <dbl> <chr>   <chr>             <dbl>
##  1      43146 FrScA   S216               93.5
##  2      44638 OcnA    S116               81.7
##  3      47448 FrScA   S216               88.5
##  4      47979 OcnA    S216               81.9
##  5      48797 PhysA   S116               84  
##  6      51943 FrScA   S216               NA  
##  7      52326 AnPhA   S216               83.6
##  8      52446 PhysA   S116               97.8
##  9      53447 FrScA   S116               96.1
## 10      53475 FrScA   S116               NA  
## # … with 593 more rows

What do you notice about FinalGradeCEMS?(*Hint: NAs?)

  1. In code chunk named select-2 select all columns except subject and section. Assign to a new object with a different name. Examine your data frame with a different function.
# Type your code here
sci_classes_3 <- select(sci_classes, - subject, -section)
sci_classes_3
## # A tibble: 603 × 28
##    student_id course_id  total…¹ total…² perce…³ semes…⁴ Grade…⁵ Grade…⁶ Final…⁷
##         <dbl> <chr>        <dbl>   <dbl>   <dbl> <chr>   <chr>   <lgl>     <dbl>
##  1      43146 FrScA-S21…    3280    2220   0.677 S216    POINTS… NA         93.5
##  2      44638 OcnA-S116…    3531    2672   0.757 S116    ATTEMP… NA         81.7
##  3      47448 FrScA-S21…    2870    1897   0.661 S216    POINTS… NA         88.5
##  4      47979 OcnA-S216…    4562    3090   0.677 S216    POINTS… NA         81.9
##  5      48797 PhysA-S11…    2207    1910   0.865 S116    POINTS… NA         84  
##  6      51943 FrScA-S21…    4208    3596   0.855 S216    POINTS… NA         NA  
##  7      52326 AnPhA-S21…    4325    2255   0.521 S216    POINTS… NA         83.6
##  8      52446 PhysA-S11…    2086    1719   0.824 S116    POINTS… NA         97.8
##  9      53447 FrScA-S11…    4655    3149   0.676 S116    POINTS… NA         96.1
## 10      53475 FrScA-S11…    1710    1402   0.820 S116    POINTS… NA         NA  
## # … with 593 more rows, 19 more variables: Points_Possible <dbl>,
## #   Points_Earned <dbl>, Gender <chr>, q1 <dbl>, q2 <dbl>, q3 <dbl>, q4 <dbl>,
## #   q5 <dbl>, q6 <dbl>, q7 <dbl>, q8 <dbl>, q9 <dbl>, q10 <dbl>,
## #   TimeSpent <dbl>, TimeSpent_hours <dbl>, TimeSpent_std <dbl>, int <dbl>,
## #   pc <dbl>, uv <dbl>, and abbreviated variable names ¹​total_points_possible,
## #   ²​total_points_earned, ³​percentage_earned, ⁴​semester, ⁵​Gradebook_Item,
## #   ⁶​Grade_Category, ⁷​FinalGradeCEMS
  1. In the code chunk named filter-1, Filter the sci_classes data frame for students in OcnA courses. Assign to a new object with a different name. Use the head() function to examine your data frame.
#Type your code here
sci_classes_4 <- filter(sci_classes, subject == "OcnA")
head(sci_classes_4)
## # A tibble: 6 × 30
##   student_id course_id   total…¹ total…² perce…³ subject semes…⁴ section Grade…⁵
##        <dbl> <chr>         <dbl>   <dbl>   <dbl> <chr>   <chr>   <chr>   <chr>  
## 1      44638 OcnA-S116-…    3531    2672   0.757 OcnA    S116    01      ATTEMP…
## 2      47979 OcnA-S216-…    4562    3090   0.677 OcnA    S216    01      POINTS…
## 3      54066 OcnA-S116-…    4641    3429   0.739 OcnA    S116    01      ATTEMP…
## 4      54282 OcnA-S116-…    3581    2777   0.775 OcnA    S116    02      POINTS…
## 5      54342 OcnA-S116-…    3256    2876   0.883 OcnA    S116    02      POINTS…
## 6      54346 OcnA-S116-…    4471    3773   0.844 OcnA    S116    01      ATTEMP…
## # … with 21 more variables: Grade_Category <lgl>, FinalGradeCEMS <dbl>,
## #   Points_Possible <dbl>, Points_Earned <dbl>, Gender <chr>, q1 <dbl>,
## #   q2 <dbl>, q3 <dbl>, q4 <dbl>, q5 <dbl>, q6 <dbl>, q7 <dbl>, q8 <dbl>,
## #   q9 <dbl>, q10 <dbl>, TimeSpent <dbl>, TimeSpent_hours <dbl>,
## #   TimeSpent_std <dbl>, int <dbl>, pc <dbl>, uv <dbl>, and abbreviated
## #   variable names ¹​total_points_possible, ²​total_points_earned,
## #   ³​percentage_earned, ⁴​semester, ⁵​Gradebook_Item

Q: How many rows does the head() function display? Hint: Check the dimensions of your tibble in the console.

  1. In code chunk named filter-2, filter the sci_classes data frame so rows with NA for points earned are removed. Assign to a new object with a different name. Use glimpse() to examine all columns of your data frame.
# Type your code here
sci_classes_5 <- filter(sci_classes, FinalGradeCEMS!="NA")
head(sci_classes_5)
## # A tibble: 6 × 30
##   student_id course_id   total…¹ total…² perce…³ subject semes…⁴ section Grade…⁵
##        <dbl> <chr>         <dbl>   <dbl>   <dbl> <chr>   <chr>   <chr>   <chr>  
## 1      43146 FrScA-S216…    3280    2220   0.677 FrScA   S216    02      POINTS…
## 2      44638 OcnA-S116-…    3531    2672   0.757 OcnA    S116    01      ATTEMP…
## 3      47448 FrScA-S216…    2870    1897   0.661 FrScA   S216    01      POINTS…
## 4      47979 OcnA-S216-…    4562    3090   0.677 OcnA    S216    01      POINTS…
## 5      48797 PhysA-S116…    2207    1910   0.865 PhysA   S116    01      POINTS…
## 6      52326 AnPhA-S216…    4325    2255   0.521 AnPhA   S216    01      POINTS…
## # … with 21 more variables: Grade_Category <lgl>, FinalGradeCEMS <dbl>,
## #   Points_Possible <dbl>, Points_Earned <dbl>, Gender <chr>, q1 <dbl>,
## #   q2 <dbl>, q3 <dbl>, q4 <dbl>, q5 <dbl>, q6 <dbl>, q7 <dbl>, q8 <dbl>,
## #   q9 <dbl>, q10 <dbl>, TimeSpent <dbl>, TimeSpent_hours <dbl>,
## #   TimeSpent_std <dbl>, int <dbl>, pc <dbl>, uv <dbl>, and abbreviated
## #   variable names ¹​total_points_possible, ²​total_points_earned,
## #   ³​percentage_earned, ⁴​semester, ⁵​Gradebook_Item
  1. In the code chunk called arrange-1, Arrange sci_classes data by subject then percentage_earned in descending order. Assign to a new object. Use the str() function to examine the data type of each column in your data frame.

  2. In the code chunk name final-wrangle, use sci_classes data data and the %>% pipe operator:

#Type your code here
sci_classes_6 <- arrange(sci_classes, subject, desc(percentage_earned))
 
head(sci_classes_6)
## # A tibble: 6 × 30
##   student_id course_id   total…¹ total…² perce…³ subject semes…⁴ section Grade…⁵
##        <dbl> <chr>         <dbl>   <dbl>   <dbl> <chr>   <chr>   <chr>   <chr>  
## 1      70192 AnPhA-S116…    1936    1763   0.911 AnPhA   S116    02      POINTS…
## 2      86488 AnPhA-S116…    3342    3033   0.908 AnPhA   S116    01      POINTS…
## 3      96690 AnPhA-S216…    4804    4309   0.897 AnPhA   S216    01      POINTS…
## 4      91175 AnPhA-S116…    3199    2867   0.896 AnPhA   S116    02      POINTS…
## 5      86267 AnPhA-S116…    3045    2705   0.888 AnPhA   S116    01      POINTS…
## 6      86707 AnPhA-S116…   11355   10026   0.883 AnPhA   S116    02      POINTS…
## # … with 21 more variables: Grade_Category <lgl>, FinalGradeCEMS <dbl>,
## #   Points_Possible <dbl>, Points_Earned <dbl>, Gender <chr>, q1 <dbl>,
## #   q2 <dbl>, q3 <dbl>, q4 <dbl>, q5 <dbl>, q6 <dbl>, q7 <dbl>, q8 <dbl>,
## #   q9 <dbl>, q10 <dbl>, TimeSpent <dbl>, TimeSpent_hours <dbl>,
## #   TimeSpent_std <dbl>, int <dbl>, pc <dbl>, uv <dbl>, and abbreviated
## #   variable names ¹​total_points_possible, ²​total_points_earned,
## #   ³​percentage_earned, ⁴​semester, ⁵​Gradebook_Item
sci_classes_7 <- sci_classes %>%
select(student_id, subject, semester, FinalGradeCEMS) %>%
  filter(subject =="OcnA") %>%
  arrange(semester, desc(FinalGradeCEMS))
head(sci_classes_7)
## # A tibble: 6 × 4
##   student_id subject semester FinalGradeCEMS
##        <dbl> <chr>   <chr>             <dbl>
## 1      66740 OcnA    S116               99.3
## 2      91818 OcnA    S116               96.5
## 3      90090 OcnA    S116               96.3
## 4      88168 OcnA    S116               96.0
## 5      89114 OcnA    S116               95.0
## 6      86758 OcnA    S116               94.6

Knit & Submit

Congratulations, you’ve completed your Foundation Badge on Learning Analytics Workflow! Complete the following steps to submit your work for review by

  1. Change the name of the author: in the YAML header at the very top of this document to your name. As noted in Reproducible Research in R, The YAML header controls the style and feel for knitted document but doesn’t actually display in the final output.

  2. Click the yarn icon above to “knit” your data product to a HTML file that will be saved in your R Project folder.

  3. Commit your changes in GitHub Desktop and push them to your online GitHub repository.

  4. Publish your HTML page the web using one of the following publishing methods: Publish on RPubs by clicking the “Publish” button located in the Viewer Pane when you knit your document. Note, you will need to quickly create a RPubs account. Publishing on GitHub using either GitHub Pages or the HTML previewer.

  5. Post a new discussion on GitHub to our Foundations Badges forum. In your post, include a link to your published web page and write a short reflection highlighting one thing you learned from this lab and one thing you’d like to explore further.