The final activity for each learning lab provides space to work with data and to reflect on how the concepts and techniques introduced in each lab might apply to your own research.
To earn a badge for each lab, you are required to respond to a set of prompts for two parts:
In Part I, you will reflect on your understanding of key concepts and begin to think about potential next steps for your own study.
In Part II, you will complete a few R exercises that demonstrates your ability to apply the first phases of the LA workflow and data wrangling techniques introduced in this learning lab.
Use the institutional library (e.g. NCSU Library), Google Scholar or search engine to locate a research article, presentation, or resource that applies learning analytics analysis to an educational context or topic of interest. More specifically, locate a study that makes use of the Learning Analytics Workflow we learned today. You are also welcome to select one of your research papers.
Provide an APA citation for your selected study.
What educational issue, “problem of practice,” and/or questions were addressed?
Briefly describe any steps of the data-intensive research workflow that detailed in your article or presentation.
What were the key findings or conclusions? What value, if any, might education practitioners find in these results?
Finally, how, if at at, were educators in your self-selected article involved prior to wrangling and analysis?
Draft a new research question of guided by the the phases of the Learning Analytics Workflow. Or use one of your current research questions.
What educational issue, “problem of practice,” and/or questions is addressed??
Briefly describe any steps of the data-intensive research workflow that can be detailed in your article or presentation.
How, if at all, will your article touch upon the application(s) of LA to “understand and improve learning and the contexts in which learning occurs?”
In our Learning Analytics code-along, we scratched the surface on the number of ways that we can wrangle the data.
Using one of the data sets provided in the data folder, your goal for this lab is to extend the Learning Analytics Workflow from our code-along by preparing and wrangling different data.
Or alternatively, you may use your own data set to use in the workflow. If you do decide to use your own data set you must include:
Show two different ways using select function
with your data, inspect and save as a new object.
Show one way to use filter function with your
data, inspect and save as a new object.
Show one way using arrange function with your
data, inspect and save as a new object.
Use the pipe operator to bring it all together.
Feel free to create a new script in your lab 2 to work through the following problems. Then when satisfied add the code in the code chunks below. Don;t forget to run the code to make sure it works.
Instructions:
Add your name to the document in author.
Set up the first (or, two if using an Introduction) phases of the
LA workflow below. I’ve added the wrangle section for you. You will need
to Prepare the libraries necessary to wrangle the
data.
read-data: Import the
sci-online-classes.csv from the data folder
and save as a new object called
sci_classes. Then inspect your data using a
function of your choice.# Type your code here
install.packages("readr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(readr)
sci_classes <- read_csv("data/sci-online-classes.csv")
## Rows: 603 Columns: 30
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (6): course_id, subject, semester, section, Gradebook_Item, Gender
## dbl (23): student_id, total_points_possible, total_points_earned, percentage...
## lgl (1): Grade_Category
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(sci_classes, 5)
## # A tibble: 5 × 30
## student_id course_id total…¹ total…² perce…³ subject semes…⁴ section Grade…⁵
## <dbl> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr>
## 1 43146 FrScA-S216… 3280 2220 0.677 FrScA S216 02 POINTS…
## 2 44638 OcnA-S116-… 3531 2672 0.757 OcnA S116 01 ATTEMP…
## 3 47448 FrScA-S216… 2870 1897 0.661 FrScA S216 01 POINTS…
## 4 47979 OcnA-S216-… 4562 3090 0.677 OcnA S216 01 POINTS…
## 5 48797 PhysA-S116… 2207 1910 0.865 PhysA S116 01 POINTS…
## # … with 21 more variables: Grade_Category <lgl>, FinalGradeCEMS <dbl>,
## # Points_Possible <dbl>, Points_Earned <dbl>, Gender <chr>, q1 <dbl>,
## # q2 <dbl>, q3 <dbl>, q4 <dbl>, q5 <dbl>, q6 <dbl>, q7 <dbl>, q8 <dbl>,
## # q9 <dbl>, q10 <dbl>, TimeSpent <dbl>, TimeSpent_hours <dbl>,
## # TimeSpent_std <dbl>, int <dbl>, pc <dbl>, uv <dbl>, and abbreviated
## # variable names ¹total_points_possible, ²total_points_earned,
## # ³percentage_earned, ⁴semester, ⁵Gradebook_Item
student_id, subject,
semester, FinalGradeCEMS. Assign to a
new object with a different name (you choose the name).# Type your code here
install.packages("dplyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
selected_data <- select(sci_classes, student_id, subject, semester, FinalGradeCEMS)
head(selected_data)
## # A tibble: 6 × 4
## student_id subject semester FinalGradeCEMS
## <dbl> <chr> <chr> <dbl>
## 1 43146 FrScA S216 93.5
## 2 44638 OcnA S116 81.7
## 3 47448 FrScA S216 88.5
## 4 47979 OcnA S216 81.9
## 5 48797 PhysA S116 84
## 6 51943 FrScA S216 NA
selected_grade <- sci_classes %>% select(student_id, subject, semester, FinalGradeCEMS)
head(selected_data)
## # A tibble: 6 × 4
## student_id subject semester FinalGradeCEMS
## <dbl> <chr> <chr> <dbl>
## 1 43146 FrScA S216 93.5
## 2 44638 OcnA S116 81.7
## 3 47448 FrScA S216 88.5
## 4 47979 OcnA S216 81.9
## 5 48797 PhysA S116 84
## 6 51943 FrScA S216 NA
What do you notice about FinalGradeCEMS?(*Hint: NAs?)
select-2 select all
columns except subject and section.
Assign to a new object with a different name. Examine
your data frame with a different function.# Type your code here
selected_data_2 <- select(sci_classes, -subject, -section)
head(selected_data_2)
## # A tibble: 6 × 28
## student_id course_id total…¹ total…² perce…³ semes…⁴ Grade…⁵ Grade…⁶ Final…⁷
## <dbl> <chr> <dbl> <dbl> <dbl> <chr> <chr> <lgl> <dbl>
## 1 43146 FrScA-S216… 3280 2220 0.677 S216 POINTS… NA 93.5
## 2 44638 OcnA-S116-… 3531 2672 0.757 S116 ATTEMP… NA 81.7
## 3 47448 FrScA-S216… 2870 1897 0.661 S216 POINTS… NA 88.5
## 4 47979 OcnA-S216-… 4562 3090 0.677 S216 POINTS… NA 81.9
## 5 48797 PhysA-S116… 2207 1910 0.865 S116 POINTS… NA 84
## 6 51943 FrScA-S216… 4208 3596 0.855 S216 POINTS… NA NA
## # … with 19 more variables: Points_Possible <dbl>, Points_Earned <dbl>,
## # Gender <chr>, q1 <dbl>, q2 <dbl>, q3 <dbl>, q4 <dbl>, q5 <dbl>, q6 <dbl>,
## # q7 <dbl>, q8 <dbl>, q9 <dbl>, q10 <dbl>, TimeSpent <dbl>,
## # TimeSpent_hours <dbl>, TimeSpent_std <dbl>, int <dbl>, pc <dbl>, uv <dbl>,
## # and abbreviated variable names ¹total_points_possible,
## # ²total_points_earned, ³percentage_earned, ⁴semester, ⁵Gradebook_Item,
## # ⁶Grade_Category, ⁷FinalGradeCEMS
filter-1, Filter the
sci_classes data frame for students in OcnA courses.
Assign to a new object with a different name. Use the
head() function to examine your data frame.#Type your code here
filtered_data <- filter(sci_classes, subject == "OcnA")
head(filtered_data)
## # A tibble: 6 × 30
## student_id course_id total…¹ total…² perce…³ subject semes…⁴ section Grade…⁵
## <dbl> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr>
## 1 44638 OcnA-S116-… 3531 2672 0.757 OcnA S116 01 ATTEMP…
## 2 47979 OcnA-S216-… 4562 3090 0.677 OcnA S216 01 POINTS…
## 3 54066 OcnA-S116-… 4641 3429 0.739 OcnA S116 01 ATTEMP…
## 4 54282 OcnA-S116-… 3581 2777 0.775 OcnA S116 02 POINTS…
## 5 54342 OcnA-S116-… 3256 2876 0.883 OcnA S116 02 POINTS…
## 6 54346 OcnA-S116-… 4471 3773 0.844 OcnA S116 01 ATTEMP…
## # … with 21 more variables: Grade_Category <lgl>, FinalGradeCEMS <dbl>,
## # Points_Possible <dbl>, Points_Earned <dbl>, Gender <chr>, q1 <dbl>,
## # q2 <dbl>, q3 <dbl>, q4 <dbl>, q5 <dbl>, q6 <dbl>, q7 <dbl>, q8 <dbl>,
## # q9 <dbl>, q10 <dbl>, TimeSpent <dbl>, TimeSpent_hours <dbl>,
## # TimeSpent_std <dbl>, int <dbl>, pc <dbl>, uv <dbl>, and abbreviated
## # variable names ¹total_points_possible, ²total_points_earned,
## # ³percentage_earned, ⁴semester, ⁵Gradebook_Item
Q: How many rows does the head() function display? Hint: Check the dimensions of your tibble in the console.
filter-2, filter the
sci_classes data frame so rows with NA for points
earned are removed. Assign to a new object with a different
name. Use glimpse() to examine all columns of your data
frame.# Type your code here
filtered_data_2 <- filter(sci_classes, !is.na(Points_Earned))
glimpse(filtered_data_2)
## Rows: 511
## Columns: 30
## $ student_id <dbl> 44638, 47979, 48797, 52446, 53447, 53475, 53475,…
## $ course_id <chr> "OcnA-S116-01", "OcnA-S216-01", "PhysA-S116-01",…
## $ total_points_possible <dbl> 3531, 4562, 2207, 2086, 4655, 1710, 1209, 4641, …
## $ total_points_earned <dbl> 2672, 3090, 1910, 1719, 3149, 1402, 977, 3429, 2…
## $ percentage_earned <dbl> 0.7567261, 0.6773345, 0.8654282, 0.8240652, 0.67…
## $ subject <chr> "OcnA", "OcnA", "PhysA", "PhysA", "FrScA", "FrSc…
## $ semester <chr> "S116", "S216", "S116", "S116", "S116", "S116", …
## $ section <chr> "01", "01", "01", "01", "01", "02", "01", "01", …
## $ Gradebook_Item <chr> "ATTEMPTED", "POINTS EARNED & TOTAL COURSE POINT…
## $ Grade_Category <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ FinalGradeCEMS <dbl> 81.70184, 81.85260, 84.00000, 97.77778, 96.11872…
## $ Points_Possible <dbl> 10, 5, 438, 10, 443, 5, 12, 10, 5, 10, 220, 30, …
## $ Points_Earned <dbl> 10.00, 4.00, 399.00, 10.00, 425.00, 2.50, 12.00,…
## $ Gender <chr> "F", "M", "F", "F", "F", "M", "M", "M", "F", "F"…
## $ q1 <dbl> 4, 5, 4, 3, 4, NA, NA, 4, 3, 5, NA, 4, 4, NA, 4,…
## $ q2 <dbl> 4, 5, 3, 3, 3, NA, NA, 5, 3, 3, NA, 2, 4, NA, 3,…
## $ q3 <dbl> 3, 3, 3, 3, 3, NA, NA, 3, 3, 5, NA, 2, 3, NA, 3,…
## $ q4 <dbl> 4, 5, 4, 3, 4, NA, NA, 5, 3, 5, NA, 4, 5, NA, 4,…
## $ q5 <dbl> 4, 5, 4, 3, 4, NA, NA, 5, 4, 5, NA, 4, 4, NA, 4,…
## $ q6 <dbl> 4, 5, 4, 4, 3, NA, NA, 5, 3, 5, NA, 4, 4, NA, 3,…
## $ q7 <dbl> 4, 4, 4, 3, 3, NA, NA, 5, 3, 5, NA, 4, 5, NA, 3,…
## $ q8 <dbl> 5, 5, 4, 3, 4, NA, NA, 4, 3, 5, NA, 4, 4, NA, 4,…
## $ q9 <dbl> 4, 5, NA, 3, 2, NA, NA, 5, 2, 2, NA, 2, 4, NA, 2…
## $ q10 <dbl> 4, 5, 3, 3, 5, NA, NA, 4, 4, 5, NA, 4, 4, NA, 3,…
## $ TimeSpent <dbl> 1382.7001, 1598.6166, 1481.8000, 1390.2167, 1479…
## $ TimeSpent_hours <dbl> 23.04500167, 26.64361000, 24.69666667, 23.170278…
## $ TimeSpent_std <dbl> -0.30780313, -0.14844697, -0.23466291, -0.302255…
## $ int <dbl> 4.2, 5.0, 3.8, 3.0, 4.2, NA, NA, 4.4, 3.4, 4.7, …
## $ pc <dbl> 3.50, 3.50, 3.50, 3.00, 3.00, NA, NA, 4.00, 3.00…
## $ uv <dbl> 4.000000, 5.000000, 3.500000, 3.333333, 2.666667…
In the code chunk called arrange-1, Arrange
sci_classes data by subject then
percentage_earned in descending order. Assign to a
new object. Use the str() function to examine the data type of
each column in your data frame.
arranged_data <- arrange(sci_classes, subject, desc(percentage_earned))
str(arranged_data)
## spc_tbl_ [603 × 30] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ student_id : num [1:603] 70192 86488 96690 91175 86267 ...
## $ course_id : chr [1:603] "AnPhA-S116-02" "AnPhA-S116-01" "AnPhA-S216-01" "AnPhA-S116-02" ...
## $ total_points_possible: num [1:603] 1936 3342 4804 3199 3045 ...
## $ total_points_earned : num [1:603] 1763 3033 4309 2867 2705 ...
## $ percentage_earned : num [1:603] 0.911 0.908 0.897 0.896 0.888 ...
## $ subject : chr [1:603] "AnPhA" "AnPhA" "AnPhA" "AnPhA" ...
## $ semester : chr [1:603] "S116" "S116" "S216" "S116" ...
## $ section : chr [1:603] "02" "01" "01" "02" ...
## $ Gradebook_Item : chr [1:603] "POINTS EARNED & TOTAL COURSE POINTS" "POINTS EARNED & TOTAL COURSE POINTS" "POINTS EARNED & TOTAL COURSE POINTS" "POINTS EARNED & TOTAL COURSE POINTS" ...
## $ Grade_Category : logi [1:603] NA NA NA NA NA NA ...
## $ FinalGradeCEMS : num [1:603] 96 87.4 64.8 82.2 35.1 ...
## $ Points_Possible : num [1:603] 10 28 10 5 50 15 10 10 353 460 ...
## $ Points_Earned : num [1:603] 7 26 3 5 50 11 8 10 330 452 ...
## $ Gender : chr [1:603] "F" "M" "F" "F" ...
## $ q1 : num [1:603] 4 4 4 5 5 4 5 4 NA NA ...
## $ q2 : num [1:603] 3 4 3 3 5 2 4 4 NA NA ...
## $ q3 : num [1:603] 3 2 2 3 3 3 4 3 NA NA ...
## $ q4 : num [1:603] 4 3 5 5 5 4 5 4 NA NA ...
## $ q5 : num [1:603] 4 3 4 5 5 4 5 4 NA NA ...
## $ q6 : num [1:603] 3 3 4 4 5 3 5 4 NA NA ...
## $ q7 : num [1:603] 3 3 3 3 4 4 5 4 NA NA ...
## $ q8 : num [1:603] 5 2 4 5 5 4 4 4 NA NA ...
## $ q9 : num [1:603] 2 3 3 3 5 1 4 4 NA NA ...
## $ q10 : num [1:603] 5 3 2 5 5 2 5 4 NA NA ...
## $ TimeSpent : num [1:603] 1537 3600 1970 1315 406 ...
## $ TimeSpent_hours : num [1:603] 25.62 60 32.83 21.92 6.77 ...
## $ TimeSpent_std : num [1:603] -0.194 1.328 0.125 -0.358 -1.029 ...
## $ int : num [1:603] 4.4 3 3.8 5 5 3.9 4.6 4 4.8 4.6 ...
## $ pc : num [1:603] 3 2.5 2.5 3 3.5 3.5 3.75 3.5 3.5 4.5 ...
## $ uv : num [1:603] 2.67 3.33 3.33 3.33 5 ...
## - attr(*, "spec")=
## .. cols(
## .. student_id = col_double(),
## .. course_id = col_character(),
## .. total_points_possible = col_double(),
## .. total_points_earned = col_double(),
## .. percentage_earned = col_double(),
## .. subject = col_character(),
## .. semester = col_character(),
## .. section = col_character(),
## .. Gradebook_Item = col_character(),
## .. Grade_Category = col_logical(),
## .. FinalGradeCEMS = col_double(),
## .. Points_Possible = col_double(),
## .. Points_Earned = col_double(),
## .. Gender = col_character(),
## .. q1 = col_double(),
## .. q2 = col_double(),
## .. q3 = col_double(),
## .. q4 = col_double(),
## .. q5 = col_double(),
## .. q6 = col_double(),
## .. q7 = col_double(),
## .. q8 = col_double(),
## .. q9 = col_double(),
## .. q10 = col_double(),
## .. TimeSpent = col_double(),
## .. TimeSpent_hours = col_double(),
## .. TimeSpent_std = col_double(),
## .. int = col_double(),
## .. pc = col_double(),
## .. uv = col_double()
## .. )
## - attr(*, "problems")=<externalptr>In the code chunk name final-wrangle, use
sci_classes data data and the %>% pipe
operator:
student_id, subject,
semester, FinalGradeCEMS.#Type your code here
# final-wrangle
final_data <- sci_classes %>%
select(student_id, subject, semester, FinalGradeCEMS) %>%
filter(subject == "OcnA") %>%
arrange(desc(FinalGradeCEMS))
head(final_data)
## # A tibble: 6 × 4
## student_id subject semester FinalGradeCEMS
## <dbl> <chr> <chr> <dbl>
## 1 66740 OcnA S116 99.3
## 2 91163 OcnA S216 97.4
## 3 94744 OcnA S216 96.8
## 4 91818 OcnA S116 96.5
## 5 90090 OcnA S116 96.3
## 6 88168 OcnA S116 96.0
Congratulations, you’ve completed your Foundation Badge on Learning Analytics Workflow! Complete the following steps to submit your work for review by
Change the name of the author: in the YAML header at the very top of this document to your name. As noted in Reproducible Research in R, The YAML header controls the style and feel for knitted document but doesn’t actually display in the final output.
Click the yarn icon above to “knit” your data product to a HTML file that will be saved in your R Project folder.
Commit your changes in GitHub Desktop and push them to your online GitHub repository.
Publish your HTML page the web using one of the following publishing methods: Publish on RPubs by clicking the “Publish” button located in the Viewer Pane when you knit your document. Note, you will need to quickly create a RPubs account. Publishing on GitHub using either GitHub Pages or the HTML previewer.
Post a new discussion on GitHub to our Foundations
Badges forum. In your post, include a link to your published web
page and write a short reflection highlighting one thing
you learned from this lab and one thing you’d like to explore
further.