Data Sources badge

The final activity for each learning lab provides space to work with data and to reflect on how the concepts and techniques introduced in each lab might apply to your own research.

To earn a badge for each lab, you are required to respond to a set of prompts for two parts:

In Part I, you will reflect on your understanding of key concepts and begin to think about potential next steps for your own study.
In Part II, you will create a simple data product in R that demonstrates your ability to apply a data analysis technique introduced in this learning lab.

Part I: Reflect and Plan

Use the institutional library (e.g. NCSU Library), Google Scholar or search engine to locate a research article, presentation, or resource that applies learning analytics analysis to an educational context or topic of interest. More specifically, locate a study that makes use of one of the data structures we learned today. You are also welcome to select one of your research papers.

Provide an APA citation for your selected study.
- Bayazit, A., Apaydin, N., & Gonullu, I. (2022). Predicting At-Risk Students in an Online Flipped Anatomy Course Using Learning Analytics. In Education Sciences (Vol. 12, Issue 9, p. 581). MDPI AG. https://doi.org/10.3390/educsci12090581
What types of data are associated with LA ?
- Moodle raw data
What type of data structures are analyzed in the educational context?
- Excel file comprising 119,517 lines with 9 columns: time, username, affected user, event context, component, event name, description, origin, and IP address.
How might this article be used to better understand a dataset or educational context of personal or professional interest to you?
- This article provides insight into the characteristics associated with how different students navigate a learning platform, characteristics that can serve as indicators of the depth/effectiveness of their learning of the platform content. This type of insight is what I hope to glean from analysis of log data associated with the online platform that will host the teacher materials developed in the Deeper Discussions in World History project.
Finally, how do these processes compare with what teachers and educational organizations already do to support and assess student learning?
- Often times teachers use students’ assignments completion and activity participation as measures of student learning and engagement in the course which give them a sense of students’ likelihood of course success (e.g., if a student isn’t turning in assignments or engaging in class activities, they likely are not going to do well in the course as these behaviors are correlated to student learning of the course content).

Draft a research question of guided by techniques and data sources that you are potentially interested in exploring in more depth.

Draft RQ: Which supervised learning technique can most accurately predict teachers who facilitate less effective classroom discussions?

What data source(s) should be analyzed or discussed?
- Log data of the online platform that hosts the World History lesson materials and professional development toolkit.
What is the purpose of your article?
- To demonstrate the goodness of a predictive model to teachers who facilitate less effective classroom discussions using platform-related engagement data.
Explain the analytical level at which these data would need to be collected and analyzed.
- The raw data would comprised every teachers' interactions with the platform contents. The teachers would be placed as unique rows and their interaction variables would be placed as columns to be analyzed. The interaction variables would be as follows:
  
  Interaction Variable - Variable Description
  - session - The number of sessions by the teacher
  - Time - The total time the teacher has spent on the Moodle LMS
  - UniqueDay - The number of unique days logged in by the teacher
  - ResourcePath - The starting and ending path associated with each resource engagement
  - TotalAction - The number of total activities/resource path events
  - CourseView - The number of lesson views
  - ResourceView - The number of lesson resource views
- The outcome variable (the variable to be predicted) would be discussion effectiveness (EffectiveDiscussion). I would apply five commonly used classification algorithms: k-nearest neighbors (kNN), decision trees (DT), naïve Bayes (NB), random forest (RF), and support vector machines (SVM). I would use performance metrics and confusion matrices for model evaluation.
How, if at all, will your article touch upon the application(s) of LA to “understand and improve learning and the contexts in which learning occurs?”
- It will give insight into the ways that teachers engage with the online teaching and professional learning resources and the association of those patterns with their relevant teaching practices/competence demonstrations.

Part II: Data Product

After you finish the script file for lab1_badge add it to the community board.

Problem 1:

Create a data frame that includes two columns, one named “students” and the other named “foods.” The first column should be this vector (note the intentional repeated values): Thor, Rogue, Electra, Electra, Wolverine

The second column should be this vector: Bread, Orange, Chocolate, Carrots, Milk

# YOUR FINAL CODE BELOW
df_stuFoods <- data.frame(students  = c("Thor", "Rogue", "Electra", "Electra",
                                        "Wolverine"),
                          foods = c("Bread", "Orange", "Chocolate", "Carrots",
                                    "Milk"))
df_stuFoods

##    students     foods
## 1      Thor     Bread
## 2     Rogue    Orange
## 3   Electra Chocolate
## 4   Electra   Carrots
## 5 Wolverine      Milk

Problem 2

Using the data frame created in Problem 2, use the table() command to create a frequency table for the column called “students”

# YOUR FINAL CODE BELOW
table(df_stuFoods)

##            foods
## students    Bread Carrots Chocolate Milk Orange
##   Electra       0       1         1    0      0
##   Rogue         0       0         0    0      1
##   Thor          1       0         0    0      0
##   Wolverine     0       0         0    1      0

Problem 3

Create a vector of five numbers of your choice between 0 and 10, save that vector to an object, and use the sum() function to calculate the sum of the numbers.

# YOUR FINAL CODE BELOW
numbers <- c(9,4,2,7,1)
sum(numbers)

## [1] 23

Problem 4

Create code to read the data/sci-online-classes.csv file into R using function(s) from the tidyverse package. (Note: this requires the package tidyverse). Save the data as an object called sci_classes.
Examine the contents of sci_classes in your console. Is your object a tibble? How do you know? (Hint: Check the output in the console.)

# YOUR FINAL CODE BELOW
#Answer to a.
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

sci_classes <- read_csv("data/sci-online-classes.csv")

## Rows: 603 Columns: 30
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (6): course_id, subject, semester, section, Gradebook_Item, Gender
## dbl (23): student_id, total_points_possible, total_points_earned, percentage...
## lgl  (1): Grade_Category
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

sci_classes

## # A tibble: 603 × 30
##    student_id course_id     total_points_possible total_points_earned
##         <dbl> <chr>                         <dbl>               <dbl>
##  1      43146 FrScA-S216-02                  3280                2220
##  2      44638 OcnA-S116-01                   3531                2672
##  3      47448 FrScA-S216-01                  2870                1897
##  4      47979 OcnA-S216-01                   4562                3090
##  5      48797 PhysA-S116-01                  2207                1910
##  6      51943 FrScA-S216-03                  4208                3596
##  7      52326 AnPhA-S216-01                  4325                2255
##  8      52446 PhysA-S116-01                  2086                1719
##  9      53447 FrScA-S116-01                  4655                3149
## 10      53475 FrScA-S116-02                  1710                1402
## # ℹ 593 more rows
## # ℹ 26 more variables: percentage_earned <dbl>, subject <chr>, semester <chr>,
## #   section <chr>, Gradebook_Item <chr>, Grade_Category <lgl>,
## #   FinalGradeCEMS <dbl>, Points_Possible <dbl>, Points_Earned <dbl>,
## #   Gender <chr>, q1 <dbl>, q2 <dbl>, q3 <dbl>, q4 <dbl>, q5 <dbl>, q6 <dbl>,
## #   q7 <dbl>, q8 <dbl>, q9 <dbl>, q10 <dbl>, TimeSpent <dbl>,
## #   TimeSpent_hours <dbl>, TimeSpent_std <dbl>, int <dbl>, pc <dbl>, uv <dbl>

#View(sci_classes)
#Answer to b.
#Yes, sci_classes is a tibble because it says "A tibble: 603 x 30". Also, the title of the object visual is spec_tbl_df, suggesting that the object has inherited from the tbl_df class.

is_tibble(sci_classes) #This function confirms the tibble status by return a value of "TRUE"

## [1] TRUE

Problem 5

Using the sci_classes data frame:

Select all columns except subject and section.
Assign to a new object with a different name.
Examine your data frame.

# YOUR FINAL CODE BELOW

new_sci_classes = subset(sci_classes, select = -c(subject,section))
new_sci_classes

## # A tibble: 603 × 28
##    student_id course_id     total_points_possible total_points_earned
##         <dbl> <chr>                         <dbl>               <dbl>
##  1      43146 FrScA-S216-02                  3280                2220
##  2      44638 OcnA-S116-01                   3531                2672
##  3      47448 FrScA-S216-01                  2870                1897
##  4      47979 OcnA-S216-01                   4562                3090
##  5      48797 PhysA-S116-01                  2207                1910
##  6      51943 FrScA-S216-03                  4208                3596
##  7      52326 AnPhA-S216-01                  4325                2255
##  8      52446 PhysA-S116-01                  2086                1719
##  9      53447 FrScA-S116-01                  4655                3149
## 10      53475 FrScA-S116-02                  1710                1402
## # ℹ 593 more rows
## # ℹ 24 more variables: percentage_earned <dbl>, semester <chr>,
## #   Gradebook_Item <chr>, Grade_Category <lgl>, FinalGradeCEMS <dbl>,
## #   Points_Possible <dbl>, Points_Earned <dbl>, Gender <chr>, q1 <dbl>,
## #   q2 <dbl>, q3 <dbl>, q4 <dbl>, q5 <dbl>, q6 <dbl>, q7 <dbl>, q8 <dbl>,
## #   q9 <dbl>, q10 <dbl>, TimeSpent <dbl>, TimeSpent_hours <dbl>,
## #   TimeSpent_std <dbl>, int <dbl>, pc <dbl>, uv <dbl>

#View(new_sci_classes)

Knit & Submit

Congratulations, you’ve completed your Data Sources Badge!

Complete the following steps to submit your work for review by:

Complete the following steps to knit and publish your work:

First, change the name of the author: in the YAML header at the very top of this document to your name. The YAML header controls the style and feel for knitted document but doesn’t actually display in the final output.
Next, click the knit button in the toolbar above to “knit” your R Markdown document to a HTML file that will be saved in your R Project folder. You should see a formatted webpage appear in your Viewer tab in the lower right pan or in a new browser window. Let’s us know if you run into any issues with knitting.
Finally, publish your webpage on on Posit Cloud by clicking the “Publish” button located in the Viewer Pane after you knit your document. See screenshot below.

Foundations Learning Badge 1

Congratulations, you’ve completed Foundations Learning Badge 1! To receive credit for this assignment and earn the an official Foundations LASER Badge, share the link to published webpage under an empty Badge Artifact column on the 2023 LASER Scholar Information and Documents spreadsheet: https://go.ncsu.edu/laser-sheet. We recommend bookmarking this spreadsheet as we’ll be using it throughout the year to keep track of your progress.

Once your instructor has checked your link, you will be provided a physical version of the badge below!