LA Foundations badge

LASER Institute Foundation Learning Lab 1

Author

Teresa Leavens

Published

July 18, 2025

The final activity for each learning lab provides space to work with data and to reflect on how the concepts and techniques introduced in each lab might apply to your own research.

To earn a badge for each lab, you are required to respond to a set of prompts for two parts:

In Part I, you will reflect on your understanding of key concepts and begin to think about potential next steps for your own study.
In Part II, you will create a simple data product in R that demonstrates your ability to apply a data analysis technique introduced in this learning lab.

Part I: Reflect and Plan

Use the institutional library (e.g. NCSU Library), Google Scholar or search engine to locate a research article, presentation, or resource that applies learning analytics analysis to an educational context or topic of interest. More specifically, locate a study that makes use of one of the data structures we learned today. You are also welcome to select one of your research papers.

Provide an APA citation for your selected study.

Quigley, D., McNamara, C., Ostwald, J., & Sumner, T. (2017). Using learning analytics to understand scientific modeling in the classroom. Frontiers in ICT, 4, 24.
What types of data are associated with LA ?
- Interactive data of students on digital tool Ecosurvey for science modeling, including organisms selected, interactions between organisms, and sequence of interactions
- Administrative data on students nested in classrooms
What type of data structures are analyzed in the educational context?
- The data from Ecosurvey is a structured dataset with the student user, students’ teacher, number of organisms included in model, number of interactions included in model, number of interactions per organism, and time history of model changes.
How might this article be used to better understand a dataset or educational context of personal or professional interest to you?
- I work with preservice elementary teachers to support their incorporation of modeling in the science classroom. Particularly interested in how they quantitate the students’ modeling expertise with the activity data, and how teachers could use this information to help assess students’ content knowledge about ecosystems and their modeling metaknowledge.
Finally, how do these processes compare with what teachers and educational organizations already do to support and assess student learning?
- For ecosystems, the assessment is usually summative about the final product, so doesn’t support building students’ modeling metaknowledge. The teachers are also usually not assessing how students are progressing with their understanding about ecosystems so that they can adapt their instruction to meet the students’ needs and current levels of content knowledge.

Draft a research question of guided by techniques and data sources that you are potentially interested in exploring in more depth.

What data source(s) should be analyzed or discussed?
- Need to collect data on the teachers’ interaction with the students since they found the modeling activity of the students depended on their classroom teacher. Need information on the types of questions that are used, how the teacher structures the non-digital activities that support their knowledge for modeling, time in the classroom allocated to modeling. Teacher preparation background would be useful for making some correlations between types of teacher preparation and their pedagogical content knowledge for supporting modeling in the science classroom.
What is the purpose of your article?
- The purpose of the original article by Quigley et al. was to communicate how clickstream data from a digital tool can be used to assess students’ understanding of modeling an ecosystem and how the classroom environment may be correlated with students’ interaction with the tool and their understanding of ecosystems.
- I would expand on their original work to use data analytics to understand teacher moves that support more robust student interactions with Ecosystem to build their content knowledge and modeling metaknowledge. This can be used as guidance for important instructional practices to focus on with both preservice teachers in their preparation, as well as to include in professional development with inservice teachers.
Explain the analytical level at which these data would need to be collected and analyzed.
- Would need fine grain data of events happening in the classroom such as types and timing of learning activities that are occurring, in addition to work on Ecosurvey. Also need fine grain data on the frequency of teacher and student interactions and the types of teacher moves that occur while students are working on their models.
How, if at all, will your article touch upon the application(s) of LA to “understand and improve learning and the contexts in which learning occurs?”
- Analysis of this data will help understand instructional practices and activities in the classroom that support student learning. As mentioned in answer to #2, this knowledge can guide preservice teacher preparation and professional development of in-service teachers.

Part II: Data Product

In our Learning Analytics code-along, we scratched the surface on the number of ways that we can wrangle the data.

Using one of the data sets provided in the data folder, your goal for this lab is to extend the Learning Analytics Workflow from our code-along by preparing and wrangling different data.

Or alternatively, you may use your own data set to use in the workflow. If you do decide to use your own data set you must include:

Show two different ways using select function with your data, inspect and save as a new object.
Show one way to use filter function with your data, inspect and save as a new object.
Show one way using arrange function with your data, inspect and save as a new object.
Use the pipe operator to bring it all together.

Feel free to create a new script in your lab 2 to work through the following problems. Then when satisfied add the code in the code chunks below. Don’t forget to run the code to make sure it works.

Instructions:

Add your name to the document in author.
Set up the first (or, two if using an Introduction) phases of the LA workflow below. I’ve added the wrangle section for you. You will need to Prepare the libraries necessary to wrangle the data.
In the chunk called read-data: Import the sci-online-classes.csv from the data folder and save as a new object called sci_classes. Then inspect your data using a function of your choice.

#load tidyverse
library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

#import
read_data <- read_csv("data/sci-online-classes.csv")

Rows: 603 Columns: 30
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (6): course_id, subject, semester, section, Gradebook_Item, Gender
dbl (23): student_id, total_points_possible, total_points_earned, percentage...
lgl  (1): Grade_Category

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

#inspect your data
read_data

# A tibble: 603 × 30
   student_id course_id     total_points_possible total_points_earned
        <dbl> <chr>                         <dbl>               <dbl>
 1      43146 FrScA-S216-02                  3280                2220
 2      44638 OcnA-S116-01                   3531                2672
 3      47448 FrScA-S216-01                  2870                1897
 4      47979 OcnA-S216-01                   4562                3090
 5      48797 PhysA-S116-01                  2207                1910
 6      51943 FrScA-S216-03                  4208                3596
 7      52326 AnPhA-S216-01                  4325                2255
 8      52446 PhysA-S116-01                  2086                1719
 9      53447 FrScA-S116-01                  4655                3149
10      53475 FrScA-S116-02                  1710                1402
# ℹ 593 more rows
# ℹ 26 more variables: percentage_earned <dbl>, subject <chr>, semester <chr>,
#   section <chr>, Gradebook_Item <chr>, Grade_Category <lgl>,
#   FinalGradeCEMS <dbl>, Points_Possible <dbl>, Points_Earned <dbl>,
#   Gender <chr>, q1 <dbl>, q2 <dbl>, q3 <dbl>, q4 <dbl>, q5 <dbl>, q6 <dbl>,
#   q7 <dbl>, q8 <dbl>, q9 <dbl>, q10 <dbl>, TimeSpent <dbl>,
#   TimeSpent_hours <dbl>, TimeSpent_std <dbl>, int <dbl>, pc <dbl>, uv <dbl>

In the select-1 code chunk: Use the ‘select’ function to select student_id, subject, semester, FinalGradeCEMS. Assign to a new object with a different name (you choose the name).

# Type your code here
read_data_sub<-read_data%>%
  select(student_id, subject, semester,FinalGradeCEMS)
  
#inspect your data
head(read_data_sub)

# A tibble: 6 × 4
  student_id subject semester FinalGradeCEMS
       <dbl> <chr>   <chr>             <dbl>
1      43146 FrScA   S216               93.5
2      44638 OcnA    S116               81.7
3      47448 FrScA   S216               88.5
4      47979 OcnA    S216               81.9
5      48797 PhysA   S116               84  
6      51943 FrScA   S216               NA

What do you notice about FinalGradeCEMS? (*Hint: NAs?)

The variable FinalGradeCEMS is an integer data type that has missing values denoted by NA. I am wondering how these are handled by R (ie can they be left as NA). Also during analysis need to determine how to best handle the missing values.

In code chunk named select-2 select all columns except subject and section. Assign to a new object with a different name. Inspect your data frame with a different function.

# Type your code here
read_data2 = select(read_data, -6, -8)

#inspect data
glimpse(read_data2)

Rows: 603
Columns: 28
$ student_id            <dbl> 43146, 44638, 47448, 47979, 48797, 51943, 52326,…
$ course_id             <chr> "FrScA-S216-02", "OcnA-S116-01", "FrScA-S216-01"…
$ total_points_possible <dbl> 3280, 3531, 2870, 4562, 2207, 4208, 4325, 2086, …
$ total_points_earned   <dbl> 2220, 2672, 1897, 3090, 1910, 3596, 2255, 1719, …
$ percentage_earned     <dbl> 0.6768293, 0.7567261, 0.6609756, 0.6773345, 0.86…
$ semester              <chr> "S216", "S116", "S216", "S216", "S116", "S216", …
$ Gradebook_Item        <chr> "POINTS EARNED & TOTAL COURSE POINTS", "ATTEMPTE…
$ Grade_Category        <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ FinalGradeCEMS        <dbl> 93.45372, 81.70184, 88.48758, 81.85260, 84.00000…
$ Points_Possible       <dbl> 5, 10, 10, 5, 438, 5, 10, 10, 443, 5, 12, 10, 5,…
$ Points_Earned         <dbl> NA, 10.00, NA, 4.00, 399.00, NA, NA, 10.00, 425.…
$ Gender                <chr> "M", "F", "M", "M", "F", "F", "M", "F", "F", "M"…
$ q1                    <dbl> 5, 4, 5, 5, 4, NA, 5, 3, 4, NA, NA, 4, 3, 5, NA,…
$ q2                    <dbl> 4, 4, 4, 5, 3, NA, 5, 3, 3, NA, NA, 5, 3, 3, NA,…
$ q3                    <dbl> 4, 3, 4, 3, 3, NA, 3, 3, 3, NA, NA, 3, 3, 5, NA,…
$ q4                    <dbl> 5, 4, 5, 5, 4, NA, 5, 3, 4, NA, NA, 5, 3, 5, NA,…
$ q5                    <dbl> 5, 4, 5, 5, 4, NA, 5, 3, 4, NA, NA, 5, 4, 5, NA,…
$ q6                    <dbl> 5, 4, 4, 5, 4, NA, 5, 4, 3, NA, NA, 5, 3, 5, NA,…
$ q7                    <dbl> 5, 4, 4, 4, 4, NA, 4, 3, 3, NA, NA, 5, 3, 5, NA,…
$ q8                    <dbl> 5, 5, 5, 5, 4, NA, 5, 3, 4, NA, NA, 4, 3, 5, NA,…
$ q9                    <dbl> 4, 4, 3, 5, NA, NA, 5, 3, 2, NA, NA, 5, 2, 2, NA…
$ q10                   <dbl> 5, 4, 5, 5, 3, NA, 5, 3, 5, NA, NA, 4, 4, 5, NA,…
$ TimeSpent             <dbl> 1555.1667, 1382.7001, 860.4335, 1598.6166, 1481.…
$ TimeSpent_hours       <dbl> 25.91944500, 23.04500167, 14.34055833, 26.643610…
$ TimeSpent_std         <dbl> -0.18051496, -0.30780313, -0.69325954, -0.148446…
$ int                   <dbl> 5.0, 4.2, 5.0, 5.0, 3.8, 4.6, 5.0, 3.0, 4.2, NA,…
$ pc                    <dbl> 4.50, 3.50, 4.00, 3.50, 3.50, 4.00, 3.50, 3.00, …
$ uv                    <dbl> 4.333333, 4.000000, 3.666667, 5.000000, 3.500000…

In the code chunk named filter-1, Filter the sci_classes data frame for students in OcnA courses. Assign to a new object with a different name. Use the head() function to examine your data frame.

#Type your code here
read_data3 = read_data %>% filter(subject=="OcnA")

#inspect your data
head(read_data3)

# A tibble: 6 × 30
  student_id course_id    total_points_possible total_points_earned
       <dbl> <chr>                        <dbl>               <dbl>
1      44638 OcnA-S116-01                  3531                2672
2      47979 OcnA-S216-01                  4562                3090
3      54066 OcnA-S116-01                  4641                3429
4      54282 OcnA-S116-02                  3581                2777
5      54342 OcnA-S116-02                  3256                2876
6      54346 OcnA-S116-01                  4471                3773
# ℹ 26 more variables: percentage_earned <dbl>, subject <chr>, semester <chr>,
#   section <chr>, Gradebook_Item <chr>, Grade_Category <lgl>,
#   FinalGradeCEMS <dbl>, Points_Possible <dbl>, Points_Earned <dbl>,
#   Gender <chr>, q1 <dbl>, q2 <dbl>, q3 <dbl>, q4 <dbl>, q5 <dbl>, q6 <dbl>,
#   q7 <dbl>, q8 <dbl>, q9 <dbl>, q10 <dbl>, TimeSpent <dbl>,
#   TimeSpent_hours <dbl>, TimeSpent_std <dbl>, int <dbl>, pc <dbl>, uv <dbl>

Q: How many rows does the head() function display? Hint: Check the dimensions of your tibble in the console.

Head displays 6 rows of observations

In code chunk named filter-2, filter the sci_classes data frame so rows with NA for points earned are removed. Assign to a new object with a different name. Use glimpse() to examine all columns of your data frame.

read_data4<-read_data %>% drop_na("Points_Earned")


#inspect data 
glimpse(read_data4)

Rows: 511
Columns: 30
$ student_id            <dbl> 44638, 47979, 48797, 52446, 53447, 53475, 53475,…
$ course_id             <chr> "OcnA-S116-01", "OcnA-S216-01", "PhysA-S116-01",…
$ total_points_possible <dbl> 3531, 4562, 2207, 2086, 4655, 1710, 1209, 4641, …
$ total_points_earned   <dbl> 2672, 3090, 1910, 1719, 3149, 1402, 977, 3429, 2…
$ percentage_earned     <dbl> 0.7567261, 0.6773345, 0.8654282, 0.8240652, 0.67…
$ subject               <chr> "OcnA", "OcnA", "PhysA", "PhysA", "FrScA", "FrSc…
$ semester              <chr> "S116", "S216", "S116", "S116", "S116", "S116", …
$ section               <chr> "01", "01", "01", "01", "01", "02", "01", "01", …
$ Gradebook_Item        <chr> "ATTEMPTED", "POINTS EARNED & TOTAL COURSE POINT…
$ Grade_Category        <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ FinalGradeCEMS        <dbl> 81.70184, 81.85260, 84.00000, 97.77778, 96.11872…
$ Points_Possible       <dbl> 10, 5, 438, 10, 443, 5, 12, 10, 5, 10, 220, 30, …
$ Points_Earned         <dbl> 10.00, 4.00, 399.00, 10.00, 425.00, 2.50, 12.00,…
$ Gender                <chr> "F", "M", "F", "F", "F", "M", "M", "M", "F", "F"…
$ q1                    <dbl> 4, 5, 4, 3, 4, NA, NA, 4, 3, 5, NA, 4, 4, NA, 4,…
$ q2                    <dbl> 4, 5, 3, 3, 3, NA, NA, 5, 3, 3, NA, 2, 4, NA, 3,…
$ q3                    <dbl> 3, 3, 3, 3, 3, NA, NA, 3, 3, 5, NA, 2, 3, NA, 3,…
$ q4                    <dbl> 4, 5, 4, 3, 4, NA, NA, 5, 3, 5, NA, 4, 5, NA, 4,…
$ q5                    <dbl> 4, 5, 4, 3, 4, NA, NA, 5, 4, 5, NA, 4, 4, NA, 4,…
$ q6                    <dbl> 4, 5, 4, 4, 3, NA, NA, 5, 3, 5, NA, 4, 4, NA, 3,…
$ q7                    <dbl> 4, 4, 4, 3, 3, NA, NA, 5, 3, 5, NA, 4, 5, NA, 3,…
$ q8                    <dbl> 5, 5, 4, 3, 4, NA, NA, 4, 3, 5, NA, 4, 4, NA, 4,…
$ q9                    <dbl> 4, 5, NA, 3, 2, NA, NA, 5, 2, 2, NA, 2, 4, NA, 2…
$ q10                   <dbl> 4, 5, 3, 3, 5, NA, NA, 4, 4, 5, NA, 4, 4, NA, 3,…
$ TimeSpent             <dbl> 1382.7001, 1598.6166, 1481.8000, 1390.2167, 1479…
$ TimeSpent_hours       <dbl> 23.04500167, 26.64361000, 24.69666667, 23.170278…
$ TimeSpent_std         <dbl> -0.30780313, -0.14844697, -0.23466291, -0.302255…
$ int                   <dbl> 4.2, 5.0, 3.8, 3.0, 4.2, NA, NA, 4.4, 3.4, 4.7, …
$ pc                    <dbl> 3.50, 3.50, 3.50, 3.00, 3.00, NA, NA, 4.00, 3.00…
$ uv                    <dbl> 4.000000, 5.000000, 3.500000, 3.333333, 2.666667…

In the code chunk called arrange-1, Arrange sci_classes data by subject then percentage_earned in descending order. Assign to a new object. Use the str() function to examine the data type of each column in your data frame.

# Type your code here
read_data6<-read_data %>%
  arrange(desc(subject),
          desc(percentage_earned))

#inspect data
str(read_data6)

spc_tbl_ [603 × 30] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ student_id           : num [1:603] 92733 62576 94189 48797 87171 ...
 $ course_id            : chr [1:603] "PhysA-S116-01" "PhysA-S116-01" "PhysA-S216-01" "PhysA-S116-01" ...
 $ total_points_possible: num [1:603] 2829 2215 3682 2207 6318 ...
 $ total_points_earned  : num [1:603] 2549 1931 3187 1910 5466 ...
 $ percentage_earned    : num [1:603] 0.901 0.872 0.866 0.865 0.865 ...
 $ subject              : chr [1:603] "PhysA" "PhysA" "PhysA" "PhysA" ...
 $ semester             : chr [1:603] "S116" "S116" "S216" "S116" ...
 $ section              : chr [1:603] "01" "01" "01" "01" ...
 $ Gradebook_Item       : chr [1:603] "POINTS EARNED & TOTAL COURSE POINTS" "POINTS EARNED & TOTAL COURSE POINTS" "POINTS EARNED & TOTAL COURSE POINTS" "POINTS EARNED & TOTAL COURSE POINTS" ...
 $ Grade_Category       : logi [1:603] NA NA NA NA NA NA ...
 $ FinalGradeCEMS       : num [1:603] 94.2 94.4 90.4 84 97.5 ...
 $ Points_Possible      : num [1:603] 457 10 10 438 443 10 5 597 140 105 ...
 $ Points_Earned        : num [1:603] 382 10 10 399 398 ...
 $ Gender               : chr [1:603] "F" "M" "F" "F" ...
 $ q1                   : num [1:603] 5 3 4 4 4 5 5 5 3 NA ...
 $ q2                   : num [1:603] 3 4 4 3 4 5 5 4 4 NA ...
 $ q3                   : num [1:603] 4 2 5 3 3 5 3 4 2 NA ...
 $ q4                   : num [1:603] 5 4 3 4 4 5 4 5 4 NA ...
 $ q5                   : num [1:603] 4 3 4 4 4 5 4 5 2 NA ...
 $ q6                   : num [1:603] 4 3 4 4 4 5 4 3 4 NA ...
 $ q7                   : num [1:603] 3 4 2 4 3 2 3 5 4 NA ...
 $ q8                   : num [1:603] 4 3 3 4 4 5 4 5 3 NA ...
 $ q9                   : num [1:603] 3 1 4 NA 4 5 3 4 3 NA ...
 $ q10                  : num [1:603] 5 4 4 3 4 5 4 5 2 NA ...
 $ TimeSpent            : num [1:603] 1021 794 2322 1482 769 ...
 $ TimeSpent_hours      : num [1:603] 17 13.2 38.7 24.7 12.8 ...
 $ TimeSpent_std        : num [1:603] -0.575 -0.742 0.385 -0.235 -0.761 ...
 $ int                  : num [1:603] 4.6 3.4 3.6 3.8 4.33 ...
 $ pc                   : num [1:603] 3.5 3 3.5 3.5 3.5 ...
 $ uv                   : num [1:603] 3.33 2.67 4 3.5 4.11 ...
 - attr(*, "spec")=
  .. cols(
  ..   student_id = col_double(),
  ..   course_id = col_character(),
  ..   total_points_possible = col_double(),
  ..   total_points_earned = col_double(),
  ..   percentage_earned = col_double(),
  ..   subject = col_character(),
  ..   semester = col_character(),
  ..   section = col_character(),
  ..   Gradebook_Item = col_character(),
  ..   Grade_Category = col_logical(),
  ..   FinalGradeCEMS = col_double(),
  ..   Points_Possible = col_double(),
  ..   Points_Earned = col_double(),
  ..   Gender = col_character(),
  ..   q1 = col_double(),
  ..   q2 = col_double(),
  ..   q3 = col_double(),
  ..   q4 = col_double(),
  ..   q5 = col_double(),
  ..   q6 = col_double(),
  ..   q7 = col_double(),
  ..   q8 = col_double(),
  ..   q9 = col_double(),
  ..   q10 = col_double(),
  ..   TimeSpent = col_double(),
  ..   TimeSpent_hours = col_double(),
  ..   TimeSpent_std = col_double(),
  ..   int = col_double(),
  ..   pc = col_double(),
  ..   uv = col_double()
  .. )
 - attr(*, "problems")=<externalptr>

In the code chunk name final-wrangle, use sci_classes data data and the %>% pipe operator:

Select student_id, subject, semester, FinalGradeCEMS.
Filter for students in OcnA courses.
Arrange grades by section in descending order.
Assign to a new object.
Examine the contents using a method of your choosing.

#Selecting just student_id, subject, semester, FinalGradeCEMS, filtering by OcnA and arranging in descending order by Grade
read_data7<-read_data %>% 
  select(student_id, subject, semester,FinalGradeCEMS) %>% 
  filter(subject=="OcnA") %>% 
  arrange(desc(FinalGradeCEMS))

#Inspect data 
read_data7

# A tibble: 111 × 4
   student_id subject semester FinalGradeCEMS
        <dbl> <chr>   <chr>             <dbl>
 1      66740 OcnA    S116               99.3
 2      91163 OcnA    S216               97.4
 3      94744 OcnA    S216               96.8
 4      91818 OcnA    S116               96.5
 5      90090 OcnA    S116               96.3
 6      88168 OcnA    S116               96.0
 7      89114 OcnA    S116               95.0
 8      86758 OcnA    S116               94.6
 9      68476 OcnA    S116               94.6
10      79893 OcnA    T116               94.5
# ℹ 101 more rows

Render & Submit

Congratulations, you’ve completed Foundations Learning Badge 1!

To receive your the Foundations Badge, you will need to render this document and publish via a method designated by your instructor such as: Quarto Pub, Posit Cloud, RPubs , GitHub Pages, or other methods. Once you have shared a link to you published document with your instructor and they have reviewed your work, you will be provided a physical or digital version of the badge pictured at the top of this document!

If you have any questions about this badge, or run into any technical issues, don’t hesitate to contact your instructor. Once your instructor has checked your link, you will be provided a physical version of the badge!

Complete the following steps to submit your work for review:

First, change the name of the author: in the YAML header at the very top of this document to your name. The YAML header controls the style and feel for knitted document but doesn’t actually display in the final output.
Next, click the knit button in the toolbar above to “knit” your R Markdown document to a HTML file that will be saved in your R Project folder. You should see a formatted webpage appear in your Viewer tab in the lower right pan or in a new browser window. Let’s us know if you run into any issues with knitting.
Finally, publish.