Introduction to R: Basic Syntax and Data Manipulation

Learning Analytics I

Author

Abby Shoulders

Published

September 15, 2025

Learning Objectives

By the end of this lesson, you will be able to:

- Use R basic syntax.

- Load and save data files.

- Wrangle data by cleaning and manipulating it for analysis.

- Perform basic data visualization using ggplot2. (you will learn more about this in Module 3)

Part 1: Setup and Loading Data

To get started, we’ll use a collection of R packages called the tidyverse. This library is essential for data science in R because it provides a consistent set of tools for cleaning, manipulating, and visualizing data.

If you don’t have the tidyverse installed, you can run install.packages("tidyverse") in the Console below before running the code in the next chunk.

First, we need a library called tidyverse to work on our project. If you get an error message when you click play button , that means you don’t have tidyverse in your work environment. You can then write install.packages("tidyverse") in the Console area below before leading the library.

For this lesson, we will use the dataset(s) provided in the data folder in the Files pane. Your goal for this module is to familiarize yourself with your data.

Or alternatively, you may use your own data set. If you do decide to use your own data set you must : Load data to start working with data. To do so, you will need to load it into R.

Part 2: Wrangle Your Data

Data wrangling is the process of cleaning, structuring, and enriching raw data into a format that’s more suitable for analysis. For our practice, we’ll use the sci-online-classes.csv file.

In the chunk called read-data below:

  1. (check the files pane on the right side->) and save it as a new object called data. We’ll use the read_csv() function from the tidyverse for this, which is a faster and more consistent way to load CSV files than the default R function.
  2. Then inspect your data using a function of your choice.

Also, in R, a pound sign(#) indicates a comment that will not be evaluated. You can use as many times as you want.A pound sign (#) indicates a comment that will not be evaluated when you run the code.

# Import/load the dataset
data <- read_csv("data/sci-online-classes.csv")
Rows: 603 Columns: 30
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (6): course_id, subject, semester, section, Gradebook_Item, Gender
dbl (23): student_id, total_points_possible, total_points_earned, percentage...
lgl  (1): Grade_Category

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Inspect your data (feel free to use head(), str(), or summary())
head(data)
# A tibble: 6 × 30
  student_id course_id     total_points_possible total_points_earned
       <dbl> <chr>                         <dbl>               <dbl>
1      43146 FrScA-S216-02                  3280                2220
2      44638 OcnA-S116-01                   3531                2672
3      47448 FrScA-S216-01                  2870                1897
4      47979 OcnA-S216-01                   4562                3090
5      48797 PhysA-S116-01                  2207                1910
6      51943 FrScA-S216-03                  4208                3596
# ℹ 26 more variables: percentage_earned <dbl>, subject <chr>, semester <chr>,
#   section <chr>, Gradebook_Item <chr>, Grade_Category <lgl>,
#   FinalGradeCEMS <dbl>, Points_Possible <dbl>, Points_Earned <dbl>,
#   Gender <chr>, q1 <dbl>, q2 <dbl>, q3 <dbl>, q4 <dbl>, q5 <dbl>, q6 <dbl>,
#   q7 <dbl>, q8 <dbl>, q9 <dbl>, q10 <dbl>, TimeSpent <dbl>,
#   TimeSpent_hours <dbl>, TimeSpent_std <dbl>, int <dbl>, pc <dbl>, uv <dbl>
# TRY OTHER FUNCTIONS BELOW (e.g., head(data))
 tail(head)
                    
1 function (x, ...) 
2 UseMethod("head") 
 str(data)
spc_tbl_ [603 × 30] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ student_id           : num [1:603] 43146 44638 47448 47979 48797 ...
 $ course_id            : chr [1:603] "FrScA-S216-02" "OcnA-S116-01" "FrScA-S216-01" "OcnA-S216-01" ...
 $ total_points_possible: num [1:603] 3280 3531 2870 4562 2207 ...
 $ total_points_earned  : num [1:603] 2220 2672 1897 3090 1910 ...
 $ percentage_earned    : num [1:603] 0.677 0.757 0.661 0.677 0.865 ...
 $ subject              : chr [1:603] "FrScA" "OcnA" "FrScA" "OcnA" ...
 $ semester             : chr [1:603] "S216" "S116" "S216" "S216" ...
 $ section              : chr [1:603] "02" "01" "01" "01" ...
 $ Gradebook_Item       : chr [1:603] "POINTS EARNED & TOTAL COURSE POINTS" "ATTEMPTED" "POINTS EARNED & TOTAL COURSE POINTS" "POINTS EARNED & TOTAL COURSE POINTS" ...
 $ Grade_Category       : logi [1:603] NA NA NA NA NA NA ...
 $ FinalGradeCEMS       : num [1:603] 93.5 81.7 88.5 81.9 84 ...
 $ Points_Possible      : num [1:603] 5 10 10 5 438 5 10 10 443 5 ...
 $ Points_Earned        : num [1:603] NA 10 NA 4 399 NA NA 10 425 2.5 ...
 $ Gender               : chr [1:603] "M" "F" "M" "M" ...
 $ q1                   : num [1:603] 5 4 5 5 4 NA 5 3 4 NA ...
 $ q2                   : num [1:603] 4 4 4 5 3 NA 5 3 3 NA ...
 $ q3                   : num [1:603] 4 3 4 3 3 NA 3 3 3 NA ...
 $ q4                   : num [1:603] 5 4 5 5 4 NA 5 3 4 NA ...
 $ q5                   : num [1:603] 5 4 5 5 4 NA 5 3 4 NA ...
 $ q6                   : num [1:603] 5 4 4 5 4 NA 5 4 3 NA ...
 $ q7                   : num [1:603] 5 4 4 4 4 NA 4 3 3 NA ...
 $ q8                   : num [1:603] 5 5 5 5 4 NA 5 3 4 NA ...
 $ q9                   : num [1:603] 4 4 3 5 NA NA 5 3 2 NA ...
 $ q10                  : num [1:603] 5 4 5 5 3 NA 5 3 5 NA ...
 $ TimeSpent            : num [1:603] 1555 1383 860 1599 1482 ...
 $ TimeSpent_hours      : num [1:603] 25.9 23 14.3 26.6 24.7 ...
 $ TimeSpent_std        : num [1:603] -0.181 -0.308 -0.693 -0.148 -0.235 ...
 $ int                  : num [1:603] 5 4.2 5 5 3.8 4.6 5 3 4.2 NA ...
 $ pc                   : num [1:603] 4.5 3.5 4 3.5 3.5 4 3.5 3 3 NA ...
 $ uv                   : num [1:603] 4.33 4 3.67 5 3.5 ...
 - attr(*, "spec")=
  .. cols(
  ..   student_id = col_double(),
  ..   course_id = col_character(),
  ..   total_points_possible = col_double(),
  ..   total_points_earned = col_double(),
  ..   percentage_earned = col_double(),
  ..   subject = col_character(),
  ..   semester = col_character(),
  ..   section = col_character(),
  ..   Gradebook_Item = col_character(),
  ..   Grade_Category = col_logical(),
  ..   FinalGradeCEMS = col_double(),
  ..   Points_Possible = col_double(),
  ..   Points_Earned = col_double(),
  ..   Gender = col_character(),
  ..   q1 = col_double(),
  ..   q2 = col_double(),
  ..   q3 = col_double(),
  ..   q4 = col_double(),
  ..   q5 = col_double(),
  ..   q6 = col_double(),
  ..   q7 = col_double(),
  ..   q8 = col_double(),
  ..   q9 = col_double(),
  ..   q10 = col_double(),
  ..   TimeSpent = col_double(),
  ..   TimeSpent_hours = col_double(),
  ..   TimeSpent_std = col_double(),
  ..   int = col_double(),
  ..   pc = col_double(),
  ..   uv = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 

Remember, once the chunk is ready, click play button on the top right of the chunk.

Exploring Data

Before cleaning, let’s explore the dataset.

# Check the structure of the data, including variable types
str(data)
spc_tbl_ [603 × 30] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ student_id           : num [1:603] 43146 44638 47448 47979 48797 ...
 $ course_id            : chr [1:603] "FrScA-S216-02" "OcnA-S116-01" "FrScA-S216-01" "OcnA-S216-01" ...
 $ total_points_possible: num [1:603] 3280 3531 2870 4562 2207 ...
 $ total_points_earned  : num [1:603] 2220 2672 1897 3090 1910 ...
 $ percentage_earned    : num [1:603] 0.677 0.757 0.661 0.677 0.865 ...
 $ subject              : chr [1:603] "FrScA" "OcnA" "FrScA" "OcnA" ...
 $ semester             : chr [1:603] "S216" "S116" "S216" "S216" ...
 $ section              : chr [1:603] "02" "01" "01" "01" ...
 $ Gradebook_Item       : chr [1:603] "POINTS EARNED & TOTAL COURSE POINTS" "ATTEMPTED" "POINTS EARNED & TOTAL COURSE POINTS" "POINTS EARNED & TOTAL COURSE POINTS" ...
 $ Grade_Category       : logi [1:603] NA NA NA NA NA NA ...
 $ FinalGradeCEMS       : num [1:603] 93.5 81.7 88.5 81.9 84 ...
 $ Points_Possible      : num [1:603] 5 10 10 5 438 5 10 10 443 5 ...
 $ Points_Earned        : num [1:603] NA 10 NA 4 399 NA NA 10 425 2.5 ...
 $ Gender               : chr [1:603] "M" "F" "M" "M" ...
 $ q1                   : num [1:603] 5 4 5 5 4 NA 5 3 4 NA ...
 $ q2                   : num [1:603] 4 4 4 5 3 NA 5 3 3 NA ...
 $ q3                   : num [1:603] 4 3 4 3 3 NA 3 3 3 NA ...
 $ q4                   : num [1:603] 5 4 5 5 4 NA 5 3 4 NA ...
 $ q5                   : num [1:603] 5 4 5 5 4 NA 5 3 4 NA ...
 $ q6                   : num [1:603] 5 4 4 5 4 NA 5 4 3 NA ...
 $ q7                   : num [1:603] 5 4 4 4 4 NA 4 3 3 NA ...
 $ q8                   : num [1:603] 5 5 5 5 4 NA 5 3 4 NA ...
 $ q9                   : num [1:603] 4 4 3 5 NA NA 5 3 2 NA ...
 $ q10                  : num [1:603] 5 4 5 5 3 NA 5 3 5 NA ...
 $ TimeSpent            : num [1:603] 1555 1383 860 1599 1482 ...
 $ TimeSpent_hours      : num [1:603] 25.9 23 14.3 26.6 24.7 ...
 $ TimeSpent_std        : num [1:603] -0.181 -0.308 -0.693 -0.148 -0.235 ...
 $ int                  : num [1:603] 5 4.2 5 5 3.8 4.6 5 3 4.2 NA ...
 $ pc                   : num [1:603] 4.5 3.5 4 3.5 3.5 4 3.5 3 3 NA ...
 $ uv                   : num [1:603] 4.33 4 3.67 5 3.5 ...
 - attr(*, "spec")=
  .. cols(
  ..   student_id = col_double(),
  ..   course_id = col_character(),
  ..   total_points_possible = col_double(),
  ..   total_points_earned = col_double(),
  ..   percentage_earned = col_double(),
  ..   subject = col_character(),
  ..   semester = col_character(),
  ..   section = col_character(),
  ..   Gradebook_Item = col_character(),
  ..   Grade_Category = col_logical(),
  ..   FinalGradeCEMS = col_double(),
  ..   Points_Possible = col_double(),
  ..   Points_Earned = col_double(),
  ..   Gender = col_character(),
  ..   q1 = col_double(),
  ..   q2 = col_double(),
  ..   q3 = col_double(),
  ..   q4 = col_double(),
  ..   q5 = col_double(),
  ..   q6 = col_double(),
  ..   q7 = col_double(),
  ..   q8 = col_double(),
  ..   q9 = col_double(),
  ..   q10 = col_double(),
  ..   TimeSpent = col_double(),
  ..   TimeSpent_hours = col_double(),
  ..   TimeSpent_std = col_double(),
  ..   int = col_double(),
  ..   pc = col_double(),
  ..   uv = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 
# Get summary statistics for each variable
summary(data)
   student_id     course_id         total_points_possible total_points_earned
 Min.   :43146   Length:603         Min.   :  840         Min.   :  651      
 1st Qu.:85612   Class :character   1st Qu.: 2810         1st Qu.: 2050      
 Median :88340   Mode  :character   Median : 3583         Median : 2757      
 Mean   :86070                      Mean   : 4274         Mean   : 3245      
 3rd Qu.:92730                      3rd Qu.: 5069         3rd Qu.: 3875      
 Max.   :97441                      Max.   :15552         Max.   :12208      
                                                                             
 percentage_earned   subject            semester           section         
 Min.   :0.3384    Length:603         Length:603         Length:603        
 1st Qu.:0.7047    Class :character   Class :character   Class :character  
 Median :0.7770    Mode  :character   Mode  :character   Mode  :character  
 Mean   :0.7577                                                            
 3rd Qu.:0.8262                                                            
 Max.   :0.9106                                                            
                                                                           
 Gradebook_Item     Grade_Category FinalGradeCEMS   Points_Possible 
 Length:603         Mode:logical   Min.   :  0.00   Min.   :  5.00  
 Class :character   NA's:603       1st Qu.: 71.25   1st Qu.: 10.00  
 Mode  :character                  Median : 84.57   Median : 10.00  
                                   Mean   : 77.20   Mean   : 76.87  
                                   3rd Qu.: 92.10   3rd Qu.: 30.00  
                                   Max.   :100.00   Max.   :935.00  
                                   NA's   :30                       
 Points_Earned       Gender                q1              q2       
 Min.   :  0.00   Length:603         Min.   :1.000   Min.   :1.000  
 1st Qu.:  7.00   Class :character   1st Qu.:4.000   1st Qu.:3.000  
 Median : 10.00   Mode  :character   Median :4.000   Median :4.000  
 Mean   : 68.63                      Mean   :4.296   Mean   :3.629  
 3rd Qu.: 26.12                      3rd Qu.:5.000   3rd Qu.:4.000  
 Max.   :828.20                      Max.   :5.000   Max.   :5.000  
 NA's   :92                          NA's   :123     NA's   :126    
       q3              q4              q5              q6       
 Min.   :1.000   Min.   :1.000   Min.   :2.000   Min.   :1.000  
 1st Qu.:3.000   1st Qu.:4.000   1st Qu.:4.000   1st Qu.:4.000  
 Median :3.000   Median :4.000   Median :4.000   Median :4.000  
 Mean   :3.327   Mean   :4.268   Mean   :4.191   Mean   :4.008  
 3rd Qu.:4.000   3rd Qu.:5.000   3rd Qu.:5.000   3rd Qu.:5.000  
 Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :5.000  
 NA's   :123     NA's   :125     NA's   :127     NA's   :127    
       q7              q8              q9             q10       
 Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
 1st Qu.:3.000   1st Qu.:4.000   1st Qu.:3.000   1st Qu.:4.000  
 Median :4.000   Median :4.000   Median :4.000   Median :4.000  
 Mean   :3.907   Mean   :4.289   Mean   :3.487   Mean   :4.101  
 3rd Qu.:4.750   3rd Qu.:5.000   3rd Qu.:4.000   3rd Qu.:5.000  
 Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :5.000  
 NA's   :129     NA's   :129     NA's   :129     NA's   :129    
   TimeSpent       TimeSpent_hours    TimeSpent_std          int       
 Min.   :   0.45   Min.   :  0.0075   Min.   :-1.3280   Min.   :2.000  
 1st Qu.: 851.90   1st Qu.: 14.1983   1st Qu.:-0.6996   1st Qu.:3.900  
 Median :1550.91   Median : 25.8485   Median :-0.1837   Median :4.200  
 Mean   :1799.75   Mean   : 29.9959   Mean   : 0.0000   Mean   :4.219  
 3rd Qu.:2426.09   3rd Qu.: 40.4348   3rd Qu.: 0.4623   3rd Qu.:4.700  
 Max.   :8870.88   Max.   :147.8481   Max.   : 5.2188   Max.   :5.000  
 NA's   :5         NA's   :5          NA's   :5         NA's   :76     
       pc              uv       
 Min.   :1.500   Min.   :1.000  
 1st Qu.:3.000   1st Qu.:3.333  
 Median :3.500   Median :3.667  
 Mean   :3.608   Mean   :3.719  
 3rd Qu.:4.000   3rd Qu.:4.167  
 Max.   :5.000   Max.   :5.000  
 NA's   :75      NA's   :75     

Question: What have you found from the dataset? Share one or two things you found interesting (e.g., missing data, data they collected, type of the variables, repetitive data)

  • [From the dataset I noticed there were several missing FinalGradeCEMS for multiple students. I found it interesting that there were three headings for TimeSpent. I would like to know the difference between these categories.]

Cleaning Data

Let’s handle some common data cleaning tasks, like dealing with missing values and removing unnecessary columns.

A common mistake is using na.omit() to remove all rows with any missing values. Let’s see what happens when we try that.

# Remove rows with missing values
data_clean <- na.omit(data)

# Check the first couple of rows
head(data_clean)
# A tibble: 0 × 30
# ℹ 30 variables: student_id <dbl>, course_id <chr>,
#   total_points_possible <dbl>, total_points_earned <dbl>,
#   percentage_earned <dbl>, subject <chr>, semester <chr>, section <chr>,
#   Gradebook_Item <chr>, Grade_Category <lgl>, FinalGradeCEMS <dbl>,
#   Points_Possible <dbl>, Points_Earned <dbl>, Gender <chr>, q1 <dbl>,
#   q2 <dbl>, q3 <dbl>, q4 <dbl>, q5 <dbl>, q6 <dbl>, q7 <dbl>, q8 <dbl>,
#   q9 <dbl>, q10 <dbl>, TimeSpent <dbl>, TimeSpent_hours <dbl>, …

Question: What did you discover from the ‘data_clean’? Why do you think this apporach isn’t working well for our dataset?

  • [The purpose of a data_clean is to identify and remove errors and inaccuracys from a dataset to improve its overall quality. A data_clean should also address missing entries in a dataset to make it complete. Since there are so many missing entries, the data_clean process severely purged this set of data.]

A better approach is to handle missing values one variable at a time. Let’s remove rows where FinalGradeCEMS is missing, as this is a key variable for our analysis.

# For example, when you look at the value of FinalGradeCEMS, there are many 'NA's. 
# Remove rows where FinalGradeCEMS is missing
data_clean <- data %>%
  filter(!is.na(FinalGradeCEMS))

# Check the result and see whether FinalGradeCEMS still has missing data
# TYPE YOUR CODE BELOW. USE HEAD() FUNCTION.
head(data_clean)
# A tibble: 6 × 30
  student_id course_id     total_points_possible total_points_earned
       <dbl> <chr>                         <dbl>               <dbl>
1      43146 FrScA-S216-02                  3280                2220
2      44638 OcnA-S116-01                   3531                2672
3      47448 FrScA-S216-01                  2870                1897
4      47979 OcnA-S216-01                   4562                3090
5      48797 PhysA-S116-01                  2207                1910
6      52326 AnPhA-S216-01                  4325                2255
# ℹ 26 more variables: percentage_earned <dbl>, subject <chr>, semester <chr>,
#   section <chr>, Gradebook_Item <chr>, Grade_Category <lgl>,
#   FinalGradeCEMS <dbl>, Points_Possible <dbl>, Points_Earned <dbl>,
#   Gender <chr>, q1 <dbl>, q2 <dbl>, q3 <dbl>, q4 <dbl>, q5 <dbl>, q6 <dbl>,
#   q7 <dbl>, q8 <dbl>, q9 <dbl>, q10 <dbl>, TimeSpent <dbl>,
#   TimeSpent_hours <dbl>, TimeSpent_std <dbl>, int <dbl>, pc <dbl>, uv <dbl>

Sometimes you may also want to delete unnecessary columns.

# Remove unnecessary columns by using a minus sign (-) followed by the column names.
data_delete <- select(data, -Grade_Category)

# Check the data again to see if the column has been deleted.
# TYPE YOUR CODE
view(data,Grade_Caategory)

Part 3: Manipulating Data with dplyr

Now we’ll use the dplyr package (part of the tidyverse) to perform some basic data manipulation such as selecting, filtering, and creating new variables.

Selecting and Filtering

Task: Use the select() function to select student_id, subject, semester, and FinalGradeCEMS columns. Assign the result to a new object called selected_data.

Hint: If you want to ONLY select and save student_id and FinalGradeCEMS, you can use, select (data, student_id, FinalGradeCEMS)

# Now TYPE YOUR CODE here for the task mentioned above.
student_id <- read.csv("data/sci-online-classes.csv")
selected_data <- select(student_id,subject,semester,FinalGradeCEMS)
select (data, student_id, FinalGradeCEMS)
# A tibble: 603 × 2
   student_id FinalGradeCEMS
        <dbl>          <dbl>
 1      43146           93.5
 2      44638           81.7
 3      47448           88.5
 4      47979           81.9
 5      48797           84  
 6      51943           NA  
 7      52326           83.6
 8      52446           97.8
 9      53447           96.1
10      53475           NA  
# ℹ 593 more rows
# INSPECT YOUR DATA (Hint: USE SUMMARY(), STR(), OR HEAD() FUNCTIONS)
head(data[, c("student_id", "FinalGradeCEMS")])
# A tibble: 6 × 2
  student_id FinalGradeCEMS
       <dbl>          <dbl>
1      43146           93.5
2      44638           81.7
3      47448           88.5
4      47979           81.9
5      48797           84  
6      51943           NA  

Question: What do you notice about FinalGradeCEMS? (Hint: NAs?)

  • [I notice some students have NA’s for their final grades which means those specific students’ grades are not available. To ensure a more accurate data analysis, these grades need to be entered.] {possible answer: I notice NA values indicating missing data. This requires handling either by imputation or removal depending on the analysis requirements.}

Task: In the chunk called select-2 select all columns except subjectand section. Assign it to a new object called reduced_data and inspect your dataframe.

Hint: You can use - symbol with the variable name(s) you want to exclude. For example, select(data, -semester)

# YOUR CODE HERE
reduced_data <- select(data,-subject,-section)


# INSPECT YOUR DATA
reduced_data
# A tibble: 603 × 28
   student_id course_id     total_points_possible total_points_earned
        <dbl> <chr>                         <dbl>               <dbl>
 1      43146 FrScA-S216-02                  3280                2220
 2      44638 OcnA-S116-01                   3531                2672
 3      47448 FrScA-S216-01                  2870                1897
 4      47979 OcnA-S216-01                   4562                3090
 5      48797 PhysA-S116-01                  2207                1910
 6      51943 FrScA-S216-03                  4208                3596
 7      52326 AnPhA-S216-01                  4325                2255
 8      52446 PhysA-S116-01                  2086                1719
 9      53447 FrScA-S116-01                  4655                3149
10      53475 FrScA-S116-02                  1710                1402
# ℹ 593 more rows
# ℹ 24 more variables: percentage_earned <dbl>, semester <chr>,
#   Gradebook_Item <chr>, Grade_Category <lgl>, FinalGradeCEMS <dbl>,
#   Points_Possible <dbl>, Points_Earned <dbl>, Gender <chr>, q1 <dbl>,
#   q2 <dbl>, q3 <dbl>, q4 <dbl>, q5 <dbl>, q6 <dbl>, q7 <dbl>, q8 <dbl>,
#   q9 <dbl>, q10 <dbl>, TimeSpent <dbl>, TimeSpent_hours <dbl>,
#   TimeSpent_std <dbl>, int <dbl>, pc <dbl>, uv <dbl>

Task: In the chunk called filter-3, filter the data frame to include only students in “OcnA” courses. Assign it to a new object called ocna_students and use head() to examine the result.

# YOUR CODE HERE
ocna_students <- filter(data, subject == "OcnA")

# EXAMINE YOUR DATA
head(ocna_students)
# A tibble: 6 × 30
  student_id course_id    total_points_possible total_points_earned
       <dbl> <chr>                        <dbl>               <dbl>
1      44638 OcnA-S116-01                  3531                2672
2      47979 OcnA-S216-01                  4562                3090
3      54066 OcnA-S116-01                  4641                3429
4      54282 OcnA-S116-02                  3581                2777
5      54342 OcnA-S116-02                  3256                2876
6      54346 OcnA-S116-01                  4471                3773
# ℹ 26 more variables: percentage_earned <dbl>, subject <chr>, semester <chr>,
#   section <chr>, Gradebook_Item <chr>, Grade_Category <lgl>,
#   FinalGradeCEMS <dbl>, Points_Possible <dbl>, Points_Earned <dbl>,
#   Gender <chr>, q1 <dbl>, q2 <dbl>, q3 <dbl>, q4 <dbl>, q5 <dbl>, q6 <dbl>,
#   q7 <dbl>, q8 <dbl>, q9 <dbl>, q10 <dbl>, TimeSpent <dbl>,
#   TimeSpent_hours <dbl>, TimeSpent_std <dbl>, int <dbl>, pc <dbl>, uv <dbl>

Question: How many rows does the head() function display? Hint: Check the dimensions of your tibble in the console.
- [6 rows of data are displayed when the head function occurs.] {Possible answer:
The head function displays 5 rows of data*}

Task: In the code chunk called filter-4, filter the data frame to remove any rows where total_points_possible is NA. Assign the result to a new object called no_na_points. Then, use glimpse() to examine all columns of your new data frame.

# TYPE YOUR CODE
no_na_points <- filter(data, !is.na(total_points_possible))

# INSPECT YOUR DATA (use glimpse())
glimpse(no_na_points)
Rows: 603
Columns: 30
$ student_id            <dbl> 43146, 44638, 47448, 47979, 48797, 51943, 52326,…
$ course_id             <chr> "FrScA-S216-02", "OcnA-S116-01", "FrScA-S216-01"…
$ total_points_possible <dbl> 3280, 3531, 2870, 4562, 2207, 4208, 4325, 2086, …
$ total_points_earned   <dbl> 2220, 2672, 1897, 3090, 1910, 3596, 2255, 1719, …
$ percentage_earned     <dbl> 0.6768293, 0.7567261, 0.6609756, 0.6773345, 0.86…
$ subject               <chr> "FrScA", "OcnA", "FrScA", "OcnA", "PhysA", "FrSc…
$ semester              <chr> "S216", "S116", "S216", "S216", "S116", "S216", …
$ section               <chr> "02", "01", "01", "01", "01", "03", "01", "01", …
$ Gradebook_Item        <chr> "POINTS EARNED & TOTAL COURSE POINTS", "ATTEMPTE…
$ Grade_Category        <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ FinalGradeCEMS        <dbl> 93.45372, 81.70184, 88.48758, 81.85260, 84.00000…
$ Points_Possible       <dbl> 5, 10, 10, 5, 438, 5, 10, 10, 443, 5, 12, 10, 5,…
$ Points_Earned         <dbl> NA, 10.00, NA, 4.00, 399.00, NA, NA, 10.00, 425.…
$ Gender                <chr> "M", "F", "M", "M", "F", "F", "M", "F", "F", "M"…
$ q1                    <dbl> 5, 4, 5, 5, 4, NA, 5, 3, 4, NA, NA, 4, 3, 5, NA,…
$ q2                    <dbl> 4, 4, 4, 5, 3, NA, 5, 3, 3, NA, NA, 5, 3, 3, NA,…
$ q3                    <dbl> 4, 3, 4, 3, 3, NA, 3, 3, 3, NA, NA, 3, 3, 5, NA,…
$ q4                    <dbl> 5, 4, 5, 5, 4, NA, 5, 3, 4, NA, NA, 5, 3, 5, NA,…
$ q5                    <dbl> 5, 4, 5, 5, 4, NA, 5, 3, 4, NA, NA, 5, 4, 5, NA,…
$ q6                    <dbl> 5, 4, 4, 5, 4, NA, 5, 4, 3, NA, NA, 5, 3, 5, NA,…
$ q7                    <dbl> 5, 4, 4, 4, 4, NA, 4, 3, 3, NA, NA, 5, 3, 5, NA,…
$ q8                    <dbl> 5, 5, 5, 5, 4, NA, 5, 3, 4, NA, NA, 4, 3, 5, NA,…
$ q9                    <dbl> 4, 4, 3, 5, NA, NA, 5, 3, 2, NA, NA, 5, 2, 2, NA…
$ q10                   <dbl> 5, 4, 5, 5, 3, NA, 5, 3, 5, NA, NA, 4, 4, 5, NA,…
$ TimeSpent             <dbl> 1555.1667, 1382.7001, 860.4335, 1598.6166, 1481.…
$ TimeSpent_hours       <dbl> 25.91944500, 23.04500167, 14.34055833, 26.643610…
$ TimeSpent_std         <dbl> -0.18051496, -0.30780313, -0.69325954, -0.148446…
$ int                   <dbl> 5.0, 4.2, 5.0, 5.0, 3.8, 4.6, 5.0, 3.0, 4.2, NA,…
$ pc                    <dbl> 4.50, 3.50, 4.00, 3.50, 3.50, 4.00, 3.50, 3.00, …
$ uv                    <dbl> 4.333333, 4.000000, 3.666667, 5.000000, 3.500000…

Question: What does ! mean in R in general?

  • [The ‘!’ in R means the logical Not operation. It is used to negate a logical value. For example, if an expression evaluates to TRUE, applying ’!” to it will result in FALSE, or vice-versa.]

Task: In code chunk called filter-5, filter the data frame to filter any rows with Final_Grade over 85. Assign the result to a new object called data_over85. Use glimpse() to examine all columns of your new data frame.

# We can make a condition within the filter() function.
# TYPE YOUR CODE
data_over85 <- filter(data, FinalGradeCEMS > 85)

# INSPECT YOUR DATA
glimpse(data_over85)
Rows: 279
Columns: 30
$ student_id            <dbl> 43146, 47448, 52446, 53447, 54066, 54282, 54434,…
$ course_id             <chr> "FrScA-S216-02", "FrScA-S216-01", "PhysA-S116-01…
$ total_points_possible <dbl> 3280, 2870, 2086, 4655, 4641, 3581, 3228, 7000, …
$ total_points_earned   <dbl> 2220, 1897, 1719, 3149, 3429, 2777, 2506, 4212, …
$ percentage_earned     <dbl> 0.6768293, 0.6609756, 0.8240652, 0.6764769, 0.73…
$ subject               <chr> "FrScA", "FrScA", "PhysA", "FrScA", "OcnA", "Ocn…
$ semester              <chr> "S216", "S216", "S116", "S116", "S116", "S116", …
$ section               <chr> "02", "01", "01", "01", "01", "02", "01", "01", …
$ Gradebook_Item        <chr> "POINTS EARNED & TOTAL COURSE POINTS", "POINTS E…
$ Grade_Category        <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ FinalGradeCEMS        <dbl> 93.45372, 88.48758, 97.77778, 96.11872, 93.90452…
$ Points_Possible       <dbl> 5, 10, 10, 443, 10, 5, 30, 10, 30, 443, 5, 15, 1…
$ Points_Earned         <dbl> NA, NA, 10.00, 425.00, 6.00, 5.00, 21.00, 10.00,…
$ Gender                <chr> "M", "M", "F", "F", "M", "F", "F", "F", "F", "F"…
$ q1                    <dbl> 5, 5, 3, 4, 4, 3, 4, NA, 3, 4, 3, 5, 4, 5, 4, 3,…
$ q2                    <dbl> 4, 4, 3, 3, 5, 3, 2, NA, 2, 2, 2, 5, 4, 4, 3, 4,…
$ q3                    <dbl> 4, 4, 3, 3, 3, 3, 2, NA, 4, 3, 2, 5, 2, 5, 3, 2,…
$ q4                    <dbl> 5, 5, 3, 4, 5, 3, 4, NA, 4, 5, 4, NA, 4, 5, 4, 4…
$ q5                    <dbl> 5, 5, 3, 4, 5, 4, 4, NA, 5, 5, 4, 5, 4, 5, 4, 3,…
$ q6                    <dbl> 5, 4, 4, 3, 5, 3, 4, NA, 3, 4, 4, 5, 4, 5, 4, 3,…
$ q7                    <dbl> 5, 4, 3, 3, 5, 3, 4, NA, 4, 3, 3, 5, 4, 5, 4, 4,…
$ q8                    <dbl> 5, 5, 3, 4, 4, 3, 4, NA, 5, 5, 4, 5, 4, 5, 4, 3,…
$ q9                    <dbl> 4, 3, 3, 2, 5, 2, 2, NA, 3, 3, 4, 4, 4, 5, 3, 1,…
$ q10                   <dbl> 5, 5, 3, 5, 4, 4, 4, NA, 3, 4, 3, 5, 4, 5, 4, 4,…
$ TimeSpent             <dbl> 1555.1667, 860.4335, 1390.2167, 1479.4166, 2625.…
$ TimeSpent_hours       <dbl> 25.919445, 14.340558, 23.170278, 24.656943, 43.7…
$ TimeSpent_std         <dbl> -0.18051496, -0.69325954, -0.30225554, -0.236421…
$ int                   <dbl> 5.0, 5.0, 3.0, 4.2, 4.4, 3.4, 4.0, 4.2, 4.0, 4.6…
$ pc                    <dbl> 4.50, 4.00, 3.00, 3.00, 4.00, 3.00, 3.00, 3.50, …
$ uv                    <dbl> 4.333333, 3.666667, 3.333333, 2.666667, 5.000000…
# Let's practice with another variable 'TimeSpent'
# Filter rows where a variable 'TimeSpent' is greater than 50
# TYPE YOUR CODE
data_timover50 <- filter(data, TimeSpent > 50)

# INSPECT YOUR DATA 
glimpse(data_timover50)
Rows: 579
Columns: 30
$ student_id            <dbl> 43146, 44638, 47448, 47979, 48797, 52326, 52446,…
$ course_id             <chr> "FrScA-S216-02", "OcnA-S116-01", "FrScA-S216-01"…
$ total_points_possible <dbl> 3280, 3531, 2870, 4562, 2207, 4325, 2086, 4655, …
$ total_points_earned   <dbl> 2220, 2672, 1897, 3090, 1910, 2255, 1719, 3149, …
$ percentage_earned     <dbl> 0.6768293, 0.7567261, 0.6609756, 0.6773345, 0.86…
$ subject               <chr> "FrScA", "OcnA", "FrScA", "OcnA", "PhysA", "AnPh…
$ semester              <chr> "S216", "S116", "S216", "S216", "S116", "S216", …
$ section               <chr> "02", "01", "01", "01", "01", "01", "01", "01", …
$ Gradebook_Item        <chr> "POINTS EARNED & TOTAL COURSE POINTS", "ATTEMPTE…
$ Grade_Category        <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ FinalGradeCEMS        <dbl> 93.45372, 81.70184, 88.48758, 81.85260, 84.00000…
$ Points_Possible       <dbl> 5, 10, 10, 5, 438, 10, 10, 443, 12, 10, 5, 10, 2…
$ Points_Earned         <dbl> NA, 10.00, NA, 4.00, 399.00, NA, 10.00, 425.00, …
$ Gender                <chr> "M", "F", "M", "M", "F", "M", "F", "F", "M", "M"…
$ q1                    <dbl> 5, 4, 5, 5, 4, 5, 3, 4, NA, 4, 3, 5, NA, 4, 4, N…
$ q2                    <dbl> 4, 4, 4, 5, 3, 5, 3, 3, NA, 5, 3, 3, NA, 2, 4, N…
$ q3                    <dbl> 4, 3, 4, 3, 3, 3, 3, 3, NA, 3, 3, 5, NA, 2, 3, N…
$ q4                    <dbl> 5, 4, 5, 5, 4, 5, 3, 4, NA, 5, 3, 5, NA, 4, 5, N…
$ q5                    <dbl> 5, 4, 5, 5, 4, 5, 3, 4, NA, 5, 4, 5, NA, 4, 4, N…
$ q6                    <dbl> 5, 4, 4, 5, 4, 5, 4, 3, NA, 5, 3, 5, NA, 4, 4, N…
$ q7                    <dbl> 5, 4, 4, 4, 4, 4, 3, 3, NA, 5, 3, 5, NA, 4, 5, N…
$ q8                    <dbl> 5, 5, 5, 5, 4, 5, 3, 4, NA, 4, 3, 5, NA, 4, 4, N…
$ q9                    <dbl> 4, 4, 3, 5, NA, 5, 3, 2, NA, 5, 2, 2, NA, 2, 4, …
$ q10                   <dbl> 5, 4, 5, 5, 3, 5, 3, 5, NA, 4, 4, 5, NA, 4, 4, N…
$ TimeSpent             <dbl> 1555.1667, 1382.7001, 860.4335, 1598.6166, 1481.…
$ TimeSpent_hours       <dbl> 25.919445, 23.045002, 14.340558, 26.643610, 24.6…
$ TimeSpent_std         <dbl> -0.18051496, -0.30780313, -0.69325954, -0.148446…
$ int                   <dbl> 5.0, 4.2, 5.0, 5.0, 3.8, 5.0, 3.0, 4.2, NA, 4.4,…
$ pc                    <dbl> 4.50, 3.50, 4.00, 3.50, 3.50, 3.50, 3.00, 3.00, …
$ uv                    <dbl> 4.333333, 4.000000, 3.666667, 5.000000, 3.500000…

Mutating and Arranging Data

The mutate() function is used to create or modify a variable. The ifelse() function is a powerful tool for conditional logic.

Task: Create a new variable called added_gender based on the existing Gender variable. If Gender is “F”, the new variable should be “Female”; otherwise, it should be “Male”.

# Create a new variable 'added_gender'
data_gender <- mutate(data, added_gender = ifelse(Gender == "F", "Female", "Male"))

# INSPECT DATA
glimpse(data_gender)
Rows: 603
Columns: 31
$ student_id            <dbl> 43146, 44638, 47448, 47979, 48797, 51943, 52326,…
$ course_id             <chr> "FrScA-S216-02", "OcnA-S116-01", "FrScA-S216-01"…
$ total_points_possible <dbl> 3280, 3531, 2870, 4562, 2207, 4208, 4325, 2086, …
$ total_points_earned   <dbl> 2220, 2672, 1897, 3090, 1910, 3596, 2255, 1719, …
$ percentage_earned     <dbl> 0.6768293, 0.7567261, 0.6609756, 0.6773345, 0.86…
$ subject               <chr> "FrScA", "OcnA", "FrScA", "OcnA", "PhysA", "FrSc…
$ semester              <chr> "S216", "S116", "S216", "S216", "S116", "S216", …
$ section               <chr> "02", "01", "01", "01", "01", "03", "01", "01", …
$ Gradebook_Item        <chr> "POINTS EARNED & TOTAL COURSE POINTS", "ATTEMPTE…
$ Grade_Category        <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ FinalGradeCEMS        <dbl> 93.45372, 81.70184, 88.48758, 81.85260, 84.00000…
$ Points_Possible       <dbl> 5, 10, 10, 5, 438, 5, 10, 10, 443, 5, 12, 10, 5,…
$ Points_Earned         <dbl> NA, 10.00, NA, 4.00, 399.00, NA, NA, 10.00, 425.…
$ Gender                <chr> "M", "F", "M", "M", "F", "F", "M", "F", "F", "M"…
$ q1                    <dbl> 5, 4, 5, 5, 4, NA, 5, 3, 4, NA, NA, 4, 3, 5, NA,…
$ q2                    <dbl> 4, 4, 4, 5, 3, NA, 5, 3, 3, NA, NA, 5, 3, 3, NA,…
$ q3                    <dbl> 4, 3, 4, 3, 3, NA, 3, 3, 3, NA, NA, 3, 3, 5, NA,…
$ q4                    <dbl> 5, 4, 5, 5, 4, NA, 5, 3, 4, NA, NA, 5, 3, 5, NA,…
$ q5                    <dbl> 5, 4, 5, 5, 4, NA, 5, 3, 4, NA, NA, 5, 4, 5, NA,…
$ q6                    <dbl> 5, 4, 4, 5, 4, NA, 5, 4, 3, NA, NA, 5, 3, 5, NA,…
$ q7                    <dbl> 5, 4, 4, 4, 4, NA, 4, 3, 3, NA, NA, 5, 3, 5, NA,…
$ q8                    <dbl> 5, 5, 5, 5, 4, NA, 5, 3, 4, NA, NA, 4, 3, 5, NA,…
$ q9                    <dbl> 4, 4, 3, 5, NA, NA, 5, 3, 2, NA, NA, 5, 2, 2, NA…
$ q10                   <dbl> 5, 4, 5, 5, 3, NA, 5, 3, 5, NA, NA, 4, 4, 5, NA,…
$ TimeSpent             <dbl> 1555.1667, 1382.7001, 860.4335, 1598.6166, 1481.…
$ TimeSpent_hours       <dbl> 25.91944500, 23.04500167, 14.34055833, 26.643610…
$ TimeSpent_std         <dbl> -0.18051496, -0.30780313, -0.69325954, -0.148446…
$ int                   <dbl> 5.0, 4.2, 5.0, 5.0, 3.8, 4.6, 5.0, 3.0, 4.2, NA,…
$ pc                    <dbl> 4.50, 3.50, 4.00, 3.50, 3.50, 4.00, 3.50, 3.00, …
$ uv                    <dbl> 4.333333, 4.000000, 3.666667, 5.000000, 3.500000…
$ added_gender          <chr> "Male", "Female", "Male", "Male", "Female", "Fem…

Question: How does ifelse() work? Hint: You can google ifelse() in R to learn about the function.

  • [The ifelse() function in R is used for efficient conditional operations on vectors. It evaluates the test argument element by element. Its purpose is to create a new vector where elements are chosen from two options based on a condition for each element.]

Task: In the code chunk called arrange-1, arrange the data by subject then by percentage_earned in descending order. Assign the result to a new object called arranged_classes and inspect it with str().

Hint: Use the arrage() function. Then, use desc('variable name'). Use the str() function to examine the data type of each column in your data frame.

arranged_classes <- arrange(data,subject, desc(percentage_earned))

# INSPECT DATA
str(arranged_classes)
spc_tbl_ [603 × 30] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ student_id           : num [1:603] 70192 86488 96690 91175 86267 ...
 $ course_id            : chr [1:603] "AnPhA-S116-02" "AnPhA-S116-01" "AnPhA-S216-01" "AnPhA-S116-02" ...
 $ total_points_possible: num [1:603] 1936 3342 4804 3199 3045 ...
 $ total_points_earned  : num [1:603] 1763 3033 4309 2867 2705 ...
 $ percentage_earned    : num [1:603] 0.911 0.908 0.897 0.896 0.888 ...
 $ subject              : chr [1:603] "AnPhA" "AnPhA" "AnPhA" "AnPhA" ...
 $ semester             : chr [1:603] "S116" "S116" "S216" "S116" ...
 $ section              : chr [1:603] "02" "01" "01" "02" ...
 $ Gradebook_Item       : chr [1:603] "POINTS EARNED & TOTAL COURSE POINTS" "POINTS EARNED & TOTAL COURSE POINTS" "POINTS EARNED & TOTAL COURSE POINTS" "POINTS EARNED & TOTAL COURSE POINTS" ...
 $ Grade_Category       : logi [1:603] NA NA NA NA NA NA ...
 $ FinalGradeCEMS       : num [1:603] 96 87.4 64.8 82.2 35.1 ...
 $ Points_Possible      : num [1:603] 10 28 10 5 50 15 10 10 353 460 ...
 $ Points_Earned        : num [1:603] 7 26 3 5 50 11 8 10 330 452 ...
 $ Gender               : chr [1:603] "F" "M" "F" "F" ...
 $ q1                   : num [1:603] 4 4 4 5 5 4 5 4 NA NA ...
 $ q2                   : num [1:603] 3 4 3 3 5 2 4 4 NA NA ...
 $ q3                   : num [1:603] 3 2 2 3 3 3 4 3 NA NA ...
 $ q4                   : num [1:603] 4 3 5 5 5 4 5 4 NA NA ...
 $ q5                   : num [1:603] 4 3 4 5 5 4 5 4 NA NA ...
 $ q6                   : num [1:603] 3 3 4 4 5 3 5 4 NA NA ...
 $ q7                   : num [1:603] 3 3 3 3 4 4 5 4 NA NA ...
 $ q8                   : num [1:603] 5 2 4 5 5 4 4 4 NA NA ...
 $ q9                   : num [1:603] 2 3 3 3 5 1 4 4 NA NA ...
 $ q10                  : num [1:603] 5 3 2 5 5 2 5 4 NA NA ...
 $ TimeSpent            : num [1:603] 1537 3600 1970 1315 406 ...
 $ TimeSpent_hours      : num [1:603] 25.62 60 32.83 21.92 6.77 ...
 $ TimeSpent_std        : num [1:603] -0.194 1.328 0.125 -0.358 -1.029 ...
 $ int                  : num [1:603] 4.4 3 3.8 5 5 3.9 4.6 4 4.8 4.6 ...
 $ pc                   : num [1:603] 3 2.5 2.5 3 3.5 3.5 3.75 3.5 3.5 4.5 ...
 $ uv                   : num [1:603] 2.67 3.33 3.33 3.33 5 ...
 - attr(*, "spec")=
  .. cols(
  ..   student_id = col_double(),
  ..   course_id = col_character(),
  ..   total_points_possible = col_double(),
  ..   total_points_earned = col_double(),
  ..   percentage_earned = col_double(),
  ..   subject = col_character(),
  ..   semester = col_character(),
  ..   section = col_character(),
  ..   Gradebook_Item = col_character(),
  ..   Grade_Category = col_logical(),
  ..   FinalGradeCEMS = col_double(),
  ..   Points_Possible = col_double(),
  ..   Points_Earned = col_double(),
  ..   Gender = col_character(),
  ..   q1 = col_double(),
  ..   q2 = col_double(),
  ..   q3 = col_double(),
  ..   q4 = col_double(),
  ..   q5 = col_double(),
  ..   q6 = col_double(),
  ..   q7 = col_double(),
  ..   q8 = col_double(),
  ..   q9 = col_double(),
  ..   q10 = col_double(),
  ..   TimeSpent = col_double(),
  ..   TimeSpent_hours = col_double(),
  ..   TimeSpent_std = col_double(),
  ..   int = col_double(),
  ..   pc = col_double(),
  ..   uv = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 

###Chaining with Pipes (%>% or |>)

The pipe operator (%>% or |> in newer R versions) allows you to chain multiple functions together, making your code more readable. It takes the output of one function and “pipes” it as the first argument into the next.

Task: In the code chunk called final-wrangle, Use the pipe operator to perform the following steps in one continuous flow:

  • Start with the data object.
  • Select the student_id, subject, semester, and FinalGradeCEMS columns.
  • Filter for students in “OcnA” courses.
  • Arrange grades by section in descending order.
  • Assign the final result to a new object called final_data.
  • Examine the contents using a method of your choosing.
# TYPE YOUR CODE HERE
final_data <- data %>%
  select(student_id,subject,semester,FinalGradeCEMS) %>%
  filter(subject == "OcnA") %>%
  arrange(desc(FinalGradeCEMS))

# CHECK THE MAIPULATED DATA
view(final_data)

# Also, try print()
print(final_data)
# A tibble: 111 × 4
   student_id subject semester FinalGradeCEMS
        <dbl> <chr>   <chr>             <dbl>
 1      66740 OcnA    S116               99.3
 2      91163 OcnA    S216               97.4
 3      94744 OcnA    S216               96.8
 4      91818 OcnA    S116               96.5
 5      90090 OcnA    S116               96.3
 6      88168 OcnA    S116               96.0
 7      89114 OcnA    S116               95.0
 8      86758 OcnA    S116               94.6
 9      68476 OcnA    S116               94.6
10      79893 OcnA    T116               94.5
# ℹ 101 more rows

Part 4: Basic Data Visualization with ggplot2

Now that we have some clean data, let’s create a simple plot to visualize a relationship between two variables. Visualization is a key part of learning analytics.

We will use the ggplot2 package (part of the tidyverse), which uses a grammar of graphics to build plots layer by layer.

Task: Create a scatterplot to visualize the relationship between FinalGradeCEMS and TimeSpent.

# Create a scatterplot of FinalGradeCEMS vs. TimeSpent
ggplot(data = data_clean, mapping = aes(x = TimeSpent, y = FinalGradeCEMS)) +
  geom_point()

Final Tasks: Practice & Submission

Congratulations, you’ve completed the first module!

Final Tasks: For your final tasks, practice the skills you’ve learned by writing your own code in the provided chunks. You can use the sci-online-classes.csv dataset or a dataset of your own.

# Replace 'your_data.csv' with the name of your dataset if you decide to use your own dataset.
your_data <- read_csv("data/sci-online-classes.csv")
Rows: 603 Columns: 30
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (6): course_id, subject, semester, section, Gradebook_Item, Gender
dbl (23): student_id, total_points_possible, total_points_earned, percentage...
lgl  (1): Grade_Category

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Inspect your data (feel free to use head(), str(), or summary() functions)
head(your_data)
# A tibble: 6 × 30
  student_id course_id     total_points_possible total_points_earned
       <dbl> <chr>                         <dbl>               <dbl>
1      43146 FrScA-S216-02                  3280                2220
2      44638 OcnA-S116-01                   3531                2672
3      47448 FrScA-S216-01                  2870                1897
4      47979 OcnA-S216-01                   4562                3090
5      48797 PhysA-S116-01                  2207                1910
6      51943 FrScA-S216-03                  4208                3596
# ℹ 26 more variables: percentage_earned <dbl>, subject <chr>, semester <chr>,
#   section <chr>, Gradebook_Item <chr>, Grade_Category <lgl>,
#   FinalGradeCEMS <dbl>, Points_Possible <dbl>, Points_Earned <dbl>,
#   Gender <chr>, q1 <dbl>, q2 <dbl>, q3 <dbl>, q4 <dbl>, q5 <dbl>, q6 <dbl>,
#   q7 <dbl>, q8 <dbl>, q9 <dbl>, q10 <dbl>, TimeSpent <dbl>,
#   TimeSpent_hours <dbl>, TimeSpent_std <dbl>, int <dbl>, pc <dbl>, uv <dbl>
  1. Show two different ways to use the select() function with your/current data. Inspect, and save your results as a new object.
# YOUR CODE HERE
select(final_data,semester,FinalGradeCEMS)
# A tibble: 111 × 2
   semester FinalGradeCEMS
   <chr>             <dbl>
 1 S116               99.3
 2 S216               97.4
 3 S216               96.8
 4 S116               96.5
 5 S116               96.3
 6 S116               96.0
 7 S116               95.0
 8 S116               94.6
 9 S116               94.6
10 T116               94.5
# ℹ 101 more rows
select(final_data, 1,3)
# A tibble: 111 × 2
   student_id semester
        <dbl> <chr>   
 1      66740 S116    
 2      91163 S216    
 3      94744 S216    
 4      91818 S116    
 5      90090 S116    
 6      88168 S116    
 7      89114 S116    
 8      86758 S116    
 9      68476 S116    
10      79893 T116    
# ℹ 101 more rows
  1. Show one way to use the filter() function with your data. Inspect and save your result as a new object.
# YOUR CODE HERE
data_over80 <- filter(final_data, FinalGradeCEMS > 80)

glimpse(data_over85)
Rows: 279
Columns: 30
$ student_id            <dbl> 43146, 47448, 52446, 53447, 54066, 54282, 54434,…
$ course_id             <chr> "FrScA-S216-02", "FrScA-S216-01", "PhysA-S116-01…
$ total_points_possible <dbl> 3280, 2870, 2086, 4655, 4641, 3581, 3228, 7000, …
$ total_points_earned   <dbl> 2220, 1897, 1719, 3149, 3429, 2777, 2506, 4212, …
$ percentage_earned     <dbl> 0.6768293, 0.6609756, 0.8240652, 0.6764769, 0.73…
$ subject               <chr> "FrScA", "FrScA", "PhysA", "FrScA", "OcnA", "Ocn…
$ semester              <chr> "S216", "S216", "S116", "S116", "S116", "S116", …
$ section               <chr> "02", "01", "01", "01", "01", "02", "01", "01", …
$ Gradebook_Item        <chr> "POINTS EARNED & TOTAL COURSE POINTS", "POINTS E…
$ Grade_Category        <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ FinalGradeCEMS        <dbl> 93.45372, 88.48758, 97.77778, 96.11872, 93.90452…
$ Points_Possible       <dbl> 5, 10, 10, 443, 10, 5, 30, 10, 30, 443, 5, 15, 1…
$ Points_Earned         <dbl> NA, NA, 10.00, 425.00, 6.00, 5.00, 21.00, 10.00,…
$ Gender                <chr> "M", "M", "F", "F", "M", "F", "F", "F", "F", "F"…
$ q1                    <dbl> 5, 5, 3, 4, 4, 3, 4, NA, 3, 4, 3, 5, 4, 5, 4, 3,…
$ q2                    <dbl> 4, 4, 3, 3, 5, 3, 2, NA, 2, 2, 2, 5, 4, 4, 3, 4,…
$ q3                    <dbl> 4, 4, 3, 3, 3, 3, 2, NA, 4, 3, 2, 5, 2, 5, 3, 2,…
$ q4                    <dbl> 5, 5, 3, 4, 5, 3, 4, NA, 4, 5, 4, NA, 4, 5, 4, 4…
$ q5                    <dbl> 5, 5, 3, 4, 5, 4, 4, NA, 5, 5, 4, 5, 4, 5, 4, 3,…
$ q6                    <dbl> 5, 4, 4, 3, 5, 3, 4, NA, 3, 4, 4, 5, 4, 5, 4, 3,…
$ q7                    <dbl> 5, 4, 3, 3, 5, 3, 4, NA, 4, 3, 3, 5, 4, 5, 4, 4,…
$ q8                    <dbl> 5, 5, 3, 4, 4, 3, 4, NA, 5, 5, 4, 5, 4, 5, 4, 3,…
$ q9                    <dbl> 4, 3, 3, 2, 5, 2, 2, NA, 3, 3, 4, 4, 4, 5, 3, 1,…
$ q10                   <dbl> 5, 5, 3, 5, 4, 4, 4, NA, 3, 4, 3, 5, 4, 5, 4, 4,…
$ TimeSpent             <dbl> 1555.1667, 860.4335, 1390.2167, 1479.4166, 2625.…
$ TimeSpent_hours       <dbl> 25.919445, 14.340558, 23.170278, 24.656943, 43.7…
$ TimeSpent_std         <dbl> -0.18051496, -0.69325954, -0.30225554, -0.236421…
$ int                   <dbl> 5.0, 5.0, 3.0, 4.2, 4.4, 3.4, 4.0, 4.2, 4.0, 4.6…
$ pc                    <dbl> 4.50, 4.00, 3.00, 3.00, 4.00, 3.00, 3.00, 3.50, …
$ uv                    <dbl> 4.333333, 3.666667, 3.333333, 2.666667, 5.000000…
  1. Show one way to use the arrange() function with your data. Inspect and save your result as a new object.
# YOUR CODE HERE
arranged_classes <- arrange(final_data,FinalGradeCEMS,desc(FinalGradeCEMS))

str(arranged_classes)
tibble [111 × 4] (S3: tbl_df/tbl/data.frame)
 $ student_id    : num [1:111] 90996 94876 94630 87122 96856 ...
 $ subject       : chr [1:111] "OcnA" "OcnA" "OcnA" "OcnA" ...
 $ semester      : chr [1:111] "S216" "S216" "S216" "S116" ...
 $ FinalGradeCEMS: num [1:111] 1.8 2.93 3.43 8.04 13.03 ...

Render & Submit

Congratulations, you’ve completed the first module!

To receive full credit, you will need to render this document and publish it via a method such as: Quarto Pub, Posit Cloud, RPubs , GitHub Pages, or other methods. Once you have shared a link to you published document with me and I have reviewed your work, you will be officially done with the current module.

Complete the following steps to submit your work for review by:

  1. First, change the name of the author: in the YAML header at the very top of this document to your name. The YAML header controls the style and feel for knitted document but doesn’t actually display in the final output.

  2. Next, click the “Render” button in the toolbar above to “render” your R Markdown document to a HTML file that will be saved in your R Project folder. You should see a formatted webpage appear in your Viewer tab in the lower right pan or in a new browser window. Let me know if you run into any issues with rendering.

  3. Finally, publish. To do publish, follow the step from https://docs.posit.co/cloud/guide/publish/#publish-from-a-cloud-project

If you have any questions about this module, or run into any technical issues, don’t hesitate to contact me.

Once I have checked your link, you will be notified!