Data Sources badge

The final activity for each learning lab provides space to work with data and to reflect on how the concepts and techniques introduced in each lab might apply to your own research.

To earn a badge for each lab, you are required to respond to a set of prompts for two parts:

In Part I, you will reflect on your understanding of key concepts and begin to think about potential next steps for your own study.
In Part II, you will create a simple data product in R that demonstrates your ability to apply a data analysis technique introduced in this learning lab.

Part I: Reflect and Plan

Use the institutional library (e.g. NCSU Library), Google Scholar or search engine to locate a research article, presentation, or resource that applies learning analytics analysis to an educational context or topic of interest. More specifically, locate a study that makes use of one of the data structures we learned today. You are also welcome to select one of your research papers.

Provide an APA citation for your selected study.
- Tarik, A., Aissa, H., & Yousef, F. (2021). Artificial intelligence and machine learning to predict student performance during the COVID-19. Procedia Computer Science, 184, 835-840.
What types of data are associated with LA ?
- Student information - Academic Performance
What type of data structures are analyzed in the educational context?
- students’ grades of technical course, science course, literature course at three levels (Common core, first year baccalaureate, second year baccalaureate)
How might this article be used to better understand a dataset or educational context of personal or professional interest to you?
- how to predict students’ performance using LA algorithms (ex. SVR, Random forest, Partial Least Squares regression)
Finally, how do these processes compare with what teachers and educational organizations already do to support and assess student learning?
- Random forest is more accurate in predicting students’ grade than linear regression

Draft a research question of guided by techniques and data sources that you are potentially interested in exploring in more depth.

What data source(s) should be analyzed or discussed?
- Administrative Data like teacher and school information
- multimodal data
- Different machine learning model
What is the purpose of your article?
- to find predictive models for academic performance forecasting
Explain the analytical level at which these data would need to be collected and analyzed.
- meta-data level need to be collected from real-time classroom activities
How, if at all, will your article touch upon the application(s) of LA to “understand and improve learning and the contexts in which learning occurs?”
- machine learning is a powerful tool for academic performance forecasting by analyzing large amounts of data. Finding models can help educators identify at-risk students and provide targeted intervention to support their success.

Part II: Data Product

After you finish the script file for lab1_badge add it to the community board.

Problem 1:

Create a data frame that includes two columns, one named “Students” and the other named “Foods”. The first column should be this vector (note the intentional repeated values): Thor, Rogue, Electra, Electra, Wolverine

The second column should be this vector: Bread, Orange, Chocolate, Carrots, Milk

# YOUR FINAL CODE HERE

Students <- c("Thor", "Rogue", "Electra", "Electra", "Wolverine")
Foods <- c("Bread", "Orange", "Chocolate", "Carrots", "Milk")
df <- data.frame(Students = Students, Foods = Foods)

df

##    Students     Foods
## 1      Thor     Bread
## 2     Rogue    Orange
## 3   Electra Chocolate
## 4   Electra   Carrots
## 5 Wolverine      Milk

table(df)

##            Foods
## Students    Bread Carrots Chocolate Milk Orange
##   Electra       0       1         1    0      0
##   Rogue         0       0         0    0      1
##   Thor          1       0         0    0      0
##   Wolverine     0       0         0    1      0

Problem 2

Using the data frame created in Problem 1, use the table() command to create a frequency table for the column called “Students”

students_freq <- table(df$Students)

students_freq

## 
##   Electra     Rogue      Thor Wolverine 
##         2         1         1         1

table(Students)

## Students
##   Electra     Rogue      Thor Wolverine 
##         2         1         1         1

Problem 3

Create a vector of five numbers of your choice between 0 and 10, save that vector to an object, and use the sum() function to calculate the sum of the numbers.

# YOUR FINAL CODE HERE
vec <- c(2, 3, 4, 5, 6)
sum <-sum(vec)
print(sum)

## [1] 20

Problem 4

Create code to read the data/sci-online-classes.csv file into R using function(s) from the tidyverse. (Note: this package loads with library(tidyverse). Save the data as an object called sci_classes.

Examine the contents of sci_classes in your console.Is your object a tibble? How do you know? (Hint: Check the output in the console.)

# YOUR FINAL CODE HERE
library(readr)
sci_online_classes <- read_csv("data/sci-online-classes.csv")

## Rows: 603 Columns: 30
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (6): course_id, subject, semester, section, Gradebook_Item, Gender
## dbl (23): student_id, total_points_possible, total_points_earned, percentage...
## lgl  (1): Grade_Category
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

sci_online_classes

## # A tibble: 603 × 30
##    student_id course_id  total…¹ total…² perce…³ subject semes…⁴ section Grade…⁵
##         <dbl> <chr>        <dbl>   <dbl>   <dbl> <chr>   <chr>   <chr>   <chr>  
##  1      43146 FrScA-S21…    3280    2220   0.677 FrScA   S216    02      POINTS…
##  2      44638 OcnA-S116…    3531    2672   0.757 OcnA    S116    01      ATTEMP…
##  3      47448 FrScA-S21…    2870    1897   0.661 FrScA   S216    01      POINTS…
##  4      47979 OcnA-S216…    4562    3090   0.677 OcnA    S216    01      POINTS…
##  5      48797 PhysA-S11…    2207    1910   0.865 PhysA   S116    01      POINTS…
##  6      51943 FrScA-S21…    4208    3596   0.855 FrScA   S216    03      POINTS…
##  7      52326 AnPhA-S21…    4325    2255   0.521 AnPhA   S216    01      POINTS…
##  8      52446 PhysA-S11…    2086    1719   0.824 PhysA   S116    01      POINTS…
##  9      53447 FrScA-S11…    4655    3149   0.676 FrScA   S116    01      POINTS…
## 10      53475 FrScA-S11…    1710    1402   0.820 FrScA   S116    02      POINTS…
## # … with 593 more rows, 21 more variables: Grade_Category <lgl>,
## #   FinalGradeCEMS <dbl>, Points_Possible <dbl>, Points_Earned <dbl>,
## #   Gender <chr>, q1 <dbl>, q2 <dbl>, q3 <dbl>, q4 <dbl>, q5 <dbl>, q6 <dbl>,
## #   q7 <dbl>, q8 <dbl>, q9 <dbl>, q10 <dbl>, TimeSpent <dbl>,
## #   TimeSpent_hours <dbl>, TimeSpent_std <dbl>, int <dbl>, pc <dbl>, uv <dbl>,
## #   and abbreviated variable names ¹total_points_possible,
## #   ²total_points_earned, ³percentage_earned, ⁴semester, ⁵Gradebook_Item

Problem 5

Using the sci_classes data frame:

Select all columns except subject and section.
Assign to a new object with a different name.
Examine your data frame.

# YOUR FINAL CODE HERE
new_df <- sci_online_classes[, !(names(sci_online_classes) %in% c("subject", "section"))]

new_df

## # A tibble: 603 × 28
##    student_id course_id  total…¹ total…² perce…³ semes…⁴ Grade…⁵ Grade…⁶ Final…⁷
##         <dbl> <chr>        <dbl>   <dbl>   <dbl> <chr>   <chr>   <lgl>     <dbl>
##  1      43146 FrScA-S21…    3280    2220   0.677 S216    POINTS… NA         93.5
##  2      44638 OcnA-S116…    3531    2672   0.757 S116    ATTEMP… NA         81.7
##  3      47448 FrScA-S21…    2870    1897   0.661 S216    POINTS… NA         88.5
##  4      47979 OcnA-S216…    4562    3090   0.677 S216    POINTS… NA         81.9
##  5      48797 PhysA-S11…    2207    1910   0.865 S116    POINTS… NA         84  
##  6      51943 FrScA-S21…    4208    3596   0.855 S216    POINTS… NA         NA  
##  7      52326 AnPhA-S21…    4325    2255   0.521 S216    POINTS… NA         83.6
##  8      52446 PhysA-S11…    2086    1719   0.824 S116    POINTS… NA         97.8
##  9      53447 FrScA-S11…    4655    3149   0.676 S116    POINTS… NA         96.1
## 10      53475 FrScA-S11…    1710    1402   0.820 S116    POINTS… NA         NA  
## # … with 593 more rows, 19 more variables: Points_Possible <dbl>,
## #   Points_Earned <dbl>, Gender <chr>, q1 <dbl>, q2 <dbl>, q3 <dbl>, q4 <dbl>,
## #   q5 <dbl>, q6 <dbl>, q7 <dbl>, q8 <dbl>, q9 <dbl>, q10 <dbl>,
## #   TimeSpent <dbl>, TimeSpent_hours <dbl>, TimeSpent_std <dbl>, int <dbl>,
## #   pc <dbl>, uv <dbl>, and abbreviated variable names ¹total_points_possible,
## #   ²total_points_earned, ³percentage_earned, ⁴semester, ⁵Gradebook_Item,
## #   ⁶Grade_Category, ⁷FinalGradeCEMS

Knit & Submit

Congratulations, you’ve completed your Data Sources Badge!

Complete the following steps to submit your work for review by:

Change the name of the author: in the YAML header at the very top of this document to your name. As noted in Reproducible Research in R, The YAML header controls the style and feel for knitted document but doesn’t actually display in the final output.
Click the yarn icon above to “knit” your data product to a HTML file that will be saved in your R Project folder.
Commit your changes in GitHub Desktop and push them to your online GitHub repository.
Publish your HTML page the web using one of the following publishing methods: Publish on RPubs by clicking the “Publish” button located in the Viewer Pane when you knit your document. Note, you will need to quickly create a RPubs account. Publishing on GitHub using either GitHub Pages or the HTML previewer.
Post a new discussion on GitHub to our Foundations Badges forum. In your post, include a link to your published web page and write a short reflection highlighting one thing you learned from this lab and one thing you’d like to explore further.