Data Sources badge

LASER Institute Foundation Learning Lab 1

Author

Larisa Olesova

Published

November 6, 2022

The final activity for each learning lab provides space to work with data and to reflect on how the concepts and techniques introduced in each lab might apply to your own research.

To earn a badge for each lab, you are required to respond to a set of prompts for two parts:

Part I: Reflect and Plan

Use the institutional library (e.g. NCSU Library), Google Scholar or search engine to locate a research article, presentation, or resource that applies learning analytics analysis to an educational context or topic of interest. More specifically, locate a study that makes use of one of the data structures we learned today. You are also welcome to select one of your research papers.

  1. Provide an APA citation for your selected study. Brouwer, J., Fernandes, C., Steglich, C., Jansen, E., Hofman, W.H., & Flache, A. (2022). The development of peer networks and academic performance in learning communities in higher education. Learning and Instruction, 80. https://doi.org/10.1016/j.learninstruc.2022.101603

  2. What types of data are associated with LA ?

    • Student Information Systems Administrative Data Structured
  3. What type of data structures are analyzed in the educational context?

    • Friendship nomination, help-seeking nomination, and GPA as academic performance.
  4. How might this article be used to better understand a dataset or educational context of personal or professional interest to you?

    • How authors examined learning communities.
  5. Finally, how do these processes compare with what teachers and educational organizations already do to support and assess student learning?

    • This study is about small groups to improve first-years students’ academic performance and successful transition.

Draft a research question of guided by techniques and data sources that you are potentially interested in exploring in more depth. - Is interaction among peers different across role-based, debate, and case-based discussions?

  1. What data source(s) should be analyzed or discussed?

    • LMS discussion board LA
  2. What is the purpose of your article?

    • To examine how the network is formed based on the design of online discussions.
  3. Explain the analytical level at which these data would need to be collected and analyzed.

    • Whom students interact with peers during discussions. Do they choose friends or random peers?
  4. How, if at all, will your article touch upon the application(s) of LA to “understand and improve learning and the contexts in which learning occurs?”

    • The article I chose will guide through the analysis to craft my own study. I will follow their steps while running my own dataset.

Part II: Data Product

After you finish the script file for lab1_badge add it to the community board.

Problem 1:

Create a data frame that includes two columns, one named “Students” and the other named “Foods”. The first column should be this vector (note the intentional repeated values): Thor, Rogue, Electra, Electra, Wolverine

The second column should be this vector: Bread, Orange, Chocolate, Carrots, Milk

# YOUR FINAL CODE HERE
Students <- c("Thor", "Rogue", "Electra", "Electra", "Wolverine")
Foods <- c("Bread", "Orange", "Chocolate", "Carrots", "Milk")

Problem 2

Using the data frame created in Problem 2, use the table() command to create a frequency table for the column called “Students”

table(Students)
Students
  Electra     Rogue      Thor Wolverine 
        2         1         1         1 

Problem 3

Create a vector of five numbers of your choice between 0 and 10, save that vector to an object, and use the sum() function to calculate the sum of the numbers.

# YOUR FINAL CODE HERE
c(3,5,7,9)
[1] 3 5 7 9
vec <- c(3,5,7,9)
sum (vec)
[1] 24

Problem 4

Create code to read the data/sci-online-classes.csv file into R using function(s) from the tidyverse. (Note: this package loads with library(tidyverse). Save the data as an object called sci_classes.

Examine the contents of sci_classes in your console.Is your object a tibble? How do you know? (Hint: Check the output in the console.)

# YOUR FINAL CODE HERE
library(readr)
library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.3.6      ✔ dplyr   1.0.10
✔ tibble  3.1.8      ✔ stringr 1.4.1 
✔ tidyr   1.2.1      ✔ forcats 0.5.2 
✔ purrr   0.3.4      
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
sci_online_classes <- read_csv("data/sci-online-classes.csv")
Rows: 603 Columns: 30
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (6): course_id, subject, semester, section, Gradebook_Item, Gender
dbl (23): student_id, total_points_possible, total_points_earned, percentage...
lgl  (1): Grade_Category

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
sci_online_classes
# A tibble: 603 × 30
   student_id course_id  total…¹ total…² perce…³ subject semes…⁴ section Grade…⁵
        <dbl> <chr>        <dbl>   <dbl>   <dbl> <chr>   <chr>   <chr>   <chr>  
 1      43146 FrScA-S21…    3280    2220   0.677 FrScA   S216    02      POINTS…
 2      44638 OcnA-S116…    3531    2672   0.757 OcnA    S116    01      ATTEMP…
 3      47448 FrScA-S21…    2870    1897   0.661 FrScA   S216    01      POINTS…
 4      47979 OcnA-S216…    4562    3090   0.677 OcnA    S216    01      POINTS…
 5      48797 PhysA-S11…    2207    1910   0.865 PhysA   S116    01      POINTS…
 6      51943 FrScA-S21…    4208    3596   0.855 FrScA   S216    03      POINTS…
 7      52326 AnPhA-S21…    4325    2255   0.521 AnPhA   S216    01      POINTS…
 8      52446 PhysA-S11…    2086    1719   0.824 PhysA   S116    01      POINTS…
 9      53447 FrScA-S11…    4655    3149   0.676 FrScA   S116    01      POINTS…
10      53475 FrScA-S11…    1710    1402   0.820 FrScA   S116    02      POINTS…
# … with 593 more rows, 21 more variables: Grade_Category <lgl>,
#   FinalGradeCEMS <dbl>, Points_Possible <dbl>, Points_Earned <dbl>,
#   Gender <chr>, q1 <dbl>, q2 <dbl>, q3 <dbl>, q4 <dbl>, q5 <dbl>, q6 <dbl>,
#   q7 <dbl>, q8 <dbl>, q9 <dbl>, q10 <dbl>, TimeSpent <dbl>,
#   TimeSpent_hours <dbl>, TimeSpent_std <dbl>, int <dbl>, pc <dbl>, uv <dbl>,
#   and abbreviated variable names ¹​total_points_possible,
#   ²​total_points_earned, ³​percentage_earned, ⁴​semester, ⁵​Gradebook_Item
glimpse(sci_online_classes)
Rows: 603
Columns: 30
$ student_id            <dbl> 43146, 44638, 47448, 47979, 48797, 51943, 52326,…
$ course_id             <chr> "FrScA-S216-02", "OcnA-S116-01", "FrScA-S216-01"…
$ total_points_possible <dbl> 3280, 3531, 2870, 4562, 2207, 4208, 4325, 2086, …
$ total_points_earned   <dbl> 2220, 2672, 1897, 3090, 1910, 3596, 2255, 1719, …
$ percentage_earned     <dbl> 0.6768293, 0.7567261, 0.6609756, 0.6773345, 0.86…
$ subject               <chr> "FrScA", "OcnA", "FrScA", "OcnA", "PhysA", "FrSc…
$ semester              <chr> "S216", "S116", "S216", "S216", "S116", "S216", …
$ section               <chr> "02", "01", "01", "01", "01", "03", "01", "01", …
$ Gradebook_Item        <chr> "POINTS EARNED & TOTAL COURSE POINTS", "ATTEMPTE…
$ Grade_Category        <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ FinalGradeCEMS        <dbl> 93.45372, 81.70184, 88.48758, 81.85260, 84.00000…
$ Points_Possible       <dbl> 5, 10, 10, 5, 438, 5, 10, 10, 443, 5, 12, 10, 5,…
$ Points_Earned         <dbl> NA, 10.00, NA, 4.00, 399.00, NA, NA, 10.00, 425.…
$ Gender                <chr> "M", "F", "M", "M", "F", "F", "M", "F", "F", "M"…
$ q1                    <dbl> 5, 4, 5, 5, 4, NA, 5, 3, 4, NA, NA, 4, 3, 5, NA,…
$ q2                    <dbl> 4, 4, 4, 5, 3, NA, 5, 3, 3, NA, NA, 5, 3, 3, NA,…
$ q3                    <dbl> 4, 3, 4, 3, 3, NA, 3, 3, 3, NA, NA, 3, 3, 5, NA,…
$ q4                    <dbl> 5, 4, 5, 5, 4, NA, 5, 3, 4, NA, NA, 5, 3, 5, NA,…
$ q5                    <dbl> 5, 4, 5, 5, 4, NA, 5, 3, 4, NA, NA, 5, 4, 5, NA,…
$ q6                    <dbl> 5, 4, 4, 5, 4, NA, 5, 4, 3, NA, NA, 5, 3, 5, NA,…
$ q7                    <dbl> 5, 4, 4, 4, 4, NA, 4, 3, 3, NA, NA, 5, 3, 5, NA,…
$ q8                    <dbl> 5, 5, 5, 5, 4, NA, 5, 3, 4, NA, NA, 4, 3, 5, NA,…
$ q9                    <dbl> 4, 4, 3, 5, NA, NA, 5, 3, 2, NA, NA, 5, 2, 2, NA…
$ q10                   <dbl> 5, 4, 5, 5, 3, NA, 5, 3, 5, NA, NA, 4, 4, 5, NA,…
$ TimeSpent             <dbl> 1555.1667, 1382.7001, 860.4335, 1598.6166, 1481.…
$ TimeSpent_hours       <dbl> 25.91944500, 23.04500167, 14.34055833, 26.643610…
$ TimeSpent_std         <dbl> -0.18051496, -0.30780313, -0.69325954, -0.148446…
$ int                   <dbl> 5.0, 4.2, 5.0, 5.0, 3.8, 4.6, 5.0, 3.0, 4.2, NA,…
$ pc                    <dbl> 4.50, 3.50, 4.00, 3.50, 3.50, 4.00, 3.50, 3.00, …
$ uv                    <dbl> 4.333333, 4.000000, 3.666667, 5.000000, 3.500000…
as_tibble(sci_online_classes)
# A tibble: 603 × 30
   student_id course_id  total…¹ total…² perce…³ subject semes…⁴ section Grade…⁵
        <dbl> <chr>        <dbl>   <dbl>   <dbl> <chr>   <chr>   <chr>   <chr>  
 1      43146 FrScA-S21…    3280    2220   0.677 FrScA   S216    02      POINTS…
 2      44638 OcnA-S116…    3531    2672   0.757 OcnA    S116    01      ATTEMP…
 3      47448 FrScA-S21…    2870    1897   0.661 FrScA   S216    01      POINTS…
 4      47979 OcnA-S216…    4562    3090   0.677 OcnA    S216    01      POINTS…
 5      48797 PhysA-S11…    2207    1910   0.865 PhysA   S116    01      POINTS…
 6      51943 FrScA-S21…    4208    3596   0.855 FrScA   S216    03      POINTS…
 7      52326 AnPhA-S21…    4325    2255   0.521 AnPhA   S216    01      POINTS…
 8      52446 PhysA-S11…    2086    1719   0.824 PhysA   S116    01      POINTS…
 9      53447 FrScA-S11…    4655    3149   0.676 FrScA   S116    01      POINTS…
10      53475 FrScA-S11…    1710    1402   0.820 FrScA   S116    02      POINTS…
# … with 593 more rows, 21 more variables: Grade_Category <lgl>,
#   FinalGradeCEMS <dbl>, Points_Possible <dbl>, Points_Earned <dbl>,
#   Gender <chr>, q1 <dbl>, q2 <dbl>, q3 <dbl>, q4 <dbl>, q5 <dbl>, q6 <dbl>,
#   q7 <dbl>, q8 <dbl>, q9 <dbl>, q10 <dbl>, TimeSpent <dbl>,
#   TimeSpent_hours <dbl>, TimeSpent_std <dbl>, int <dbl>, pc <dbl>, uv <dbl>,
#   and abbreviated variable names ¹​total_points_possible,
#   ²​total_points_earned, ³​percentage_earned, ⁴​semester, ⁵​Gradebook_Item
sci_classes <- sci_online_classes
sci_online_classes %>% select(c(!subject, !section))
# A tibble: 603 × 30
   student_id course_id  total…¹ total…² perce…³ semes…⁴ section Grade…⁵ Grade…⁶
        <dbl> <chr>        <dbl>   <dbl>   <dbl> <chr>   <chr>   <chr>   <lgl>  
 1      43146 FrScA-S21…    3280    2220   0.677 S216    02      POINTS… NA     
 2      44638 OcnA-S116…    3531    2672   0.757 S116    01      ATTEMP… NA     
 3      47448 FrScA-S21…    2870    1897   0.661 S216    01      POINTS… NA     
 4      47979 OcnA-S216…    4562    3090   0.677 S216    01      POINTS… NA     
 5      48797 PhysA-S11…    2207    1910   0.865 S116    01      POINTS… NA     
 6      51943 FrScA-S21…    4208    3596   0.855 S216    03      POINTS… NA     
 7      52326 AnPhA-S21…    4325    2255   0.521 S216    01      POINTS… NA     
 8      52446 PhysA-S11…    2086    1719   0.824 S116    01      POINTS… NA     
 9      53447 FrScA-S11…    4655    3149   0.676 S116    01      POINTS… NA     
10      53475 FrScA-S11…    1710    1402   0.820 S116    02      POINTS… NA     
# … with 593 more rows, 21 more variables: FinalGradeCEMS <dbl>,
#   Points_Possible <dbl>, Points_Earned <dbl>, Gender <chr>, q1 <dbl>,
#   q2 <dbl>, q3 <dbl>, q4 <dbl>, q5 <dbl>, q6 <dbl>, q7 <dbl>, q8 <dbl>,
#   q9 <dbl>, q10 <dbl>, TimeSpent <dbl>, TimeSpent_hours <dbl>,
#   TimeSpent_std <dbl>, int <dbl>, pc <dbl>, uv <dbl>, subject <chr>, and
#   abbreviated variable names ¹​total_points_possible, ²​total_points_earned,
#   ³​percentage_earned, ⁴​semester, ⁵​Gradebook_Item, ⁶​Grade_Category

Problem 5

Using the sci_classes data frame:

  1. Select all columns except subject and section.

  2. Assign to a new object with a different name.

  3. Examine your data frame.

Knit & Submit

Congratulations, you’ve completed your Data Sources Badge!

Complete the following steps to submit your work for review by:

  1. Change the name of the author: in the YAML header at the very top of this document to your name. As noted in Reproducible Research in R, The YAML header controls the style and feel for knitted document but doesn’t actually display in the final output.

  2. Click the yarn icon above to “knit” your data product to a HTML file that will be saved in your R Project folder.

  3. Commit your changes in GitHub Desktop and push them to your online GitHub repository.

  4. Publish your HTML page the web using one of the following publishing methods: Publish on RPubs by clicking the “Publish” button located in the Viewer Pane when you knit your document. Note, you will need to quickly create a RPubs account. Publishing on GitHub using either GitHub Pages or the HTML previewer.

  5. Post a new discussion on GitHub to our Foundations Badges forum. In your post, include a link to your published web page and write a short reflection highlighting one thing you learned from this lab and one thing you’d like to explore further.