Install R

https://cloud.r-project.org

Install RStudio

You should install R before installing RStudio.

https://posit.co/download/rstudio-desktop/

Using Posit Cloud

https://posit.cloud/

Create a class folder

Having a dedicated folder for class will help you open your data set.
Create a folder (e.g., LTEC6505_Spring2026 somewhere on your computer)
From now on, put any Rmd and CSV files you downloaded from Canvas or you create on your own in this folder

RStudio panes

Source (Top left)
Console (Bottom left)
Environment/History/Connections/Tutorial (Top right)
Files/Plots/Packages/Help/Viewer/Presentation Bottom right)

R Markdown document

Text
Code chunk
HTML report

Install a package

Package id a collection of R commands that will allow you to analyze your data.

In this course, we need to use an R package named tidyverse.
You can install an R package in the code chunk.
- You should use a single or double quote consistently.

#install.packages("tidyverse")

Even though the message is shown in red, it is NOT an error.
The message shows where the package is downloaded from, and where the downloaded package is installed on your computer.
You can also install a package using Packages pane.

Using a package

In order to use an R package, you need to load it into your R session.

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.1     ✔ tibble    3.3.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

This message indicates that tidyverse package loaded other packages such as dplyr, readr and etc.

Also, tidyverse_conflicts() part indicates that filter() and log() are defined in two packages (dplyr and stats), and (dplyr;;filter()andstats::log()`` will be used in your Rmd file.

Hands-on activity tasks

Create a blank R Markdown document in RStudio.
- Make sure to carefully follow all the steps described in the course note.
- Make sure to delete placeholder text in the blank R Markdown document.
Read in edtech_quantitative_data_v2.csv file.

Put your Rmd file and data file in the same folder.
The file name should be single= or double-quoted.

df <- read_csv("edtech_quantitative_data_v2.csv")

## Rows: 200 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Gender, Access_Device, Instruction_Mode
## dbl (4): Student_ID, Pre_test_Score, Post_test_Score, LMS_Engagement_Index
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

The message indicates that read_csv() loaded a data file consisting of 7 columns (Student_ID, Gender, Access_Device, Instruction_Mode, Pre_test_Score, Post_test_Score, LMS_Engagement_Index).

No space between “<” and “-” characters.

List the column names in the CSV file you loaded.

names(df)

## [1] "Student_ID"           "Gender"               "Access_Device"       
## [4] "Instruction_Mode"     "Pre_test_Score"       "Post_test_Score"     
## [7] "LMS_Engagement_Index"

examine the first several rows in the data file you loaded.

head(df)

Find the number of columns and rows in the loaded data set.

dim(df)

## [1] 200   7

Create a new R object named my_obj to save the standard deviation value of Pre_test_Score column. Evaluate my_obj to show the computed value.

To compute a standard deviation, we can use ’sd()` function. -help pane can be used to look up documentations.
We need to pass a set of numbers, which are ozone values, to the sd() function.
SD is a descriptive statistic, which you will learn in the next module, showing how variable your data is.
Large SDF means your data take on quite different values.
So we need to first to gather ozone values from the data (df, which we call tibble).

df$Pre_test_Score

##   [1] 73.5 73.0 67.6 59.5 67.9 70.7 70.0 73.8 79.7 72.1 72.8 70.7 75.9 72.7 73.7
##  [16] 79.0 71.1 68.6 72.2 75.7 72.5 75.3 66.7 65.7 67.3 63.7 70.6 67.8 76.2 63.5
##  [31] 77.8 65.1 63.8 65.9 79.8 69.8 72.6 69.1 69.5 67.5 70.3 72.5 78.4 60.6 78.1
##  [46] 72.6 68.2 67.3 75.2 71.1 62.3 71.8 75.7 74.3 66.9 63.3 68.4 86.3 62.3 65.8
##  [61] 72.0 69.4 69.0 80.1 65.7 69.9 71.0 66.0 62.8 71.3 79.1 72.0 70.5 70.4 68.7
##  [76] 75.9 67.5 63.2 62.1 70.4 75.3 71.2 68.4 72.3 70.9 74.1 76.6 65.2 58.3 68.2
##  [91] 64.9 74.2 66.4 60.6 67.1 62.5 78.0 72.4 73.2 72.7 74.2 75.7 67.3 62.9 65.8
## [106] 69.8 57.2 75.7 60.5 73.0 69.2 71.4 67.6 71.1 67.6 67.2 65.5 77.6 71.8 71.0
## [121] 65.6 68.8 65.1 67.5 64.0 77.9 66.2 65.6 63.4 66.1 67.5 69.8 66.8 62.0 62.4
## [136] 73.4 69.4 73.2 73.9 70.8 63.8 70.9 77.0 67.8 71.7 69.9 76.9 66.6 75.8 68.1
## [151] 66.5 70.5 71.7 79.6 71.6 71.0 78.0 73.0 66.8 76.8 66.3 66.8 65.8 65.3 71.7
## [166] 68.7 52.6 68.7 67.4 74.5 67.2 82.7 77.0 68.1 75.7 72.6 71.2 74.0 70.3 71.2
## [181] 74.1 80.3 71.3 67.7 59.9 79.8 68.7 72.7 79.5 71.1 69.0 77.1 72.3 72.9 62.4
## [196] 76.0 71.9 63.1 72.4 59.7

$ operator extracts a column from a tibble. The extracted column, which contains 200 numeric values, is printed in the output. However, the printed output is not available to us (unless you want to copy-and-paste printed values).

If you want to use the result, you need to save it in an R object using assignment operator.

pre_test_score_values <- df$Pre_test_Score

We can evaluate this new object, object_values by using its name in a code chunk.

pre_test_score_values

##   [1] 73.5 73.0 67.6 59.5 67.9 70.7 70.0 73.8 79.7 72.1 72.8 70.7 75.9 72.7 73.7
##  [16] 79.0 71.1 68.6 72.2 75.7 72.5 75.3 66.7 65.7 67.3 63.7 70.6 67.8 76.2 63.5
##  [31] 77.8 65.1 63.8 65.9 79.8 69.8 72.6 69.1 69.5 67.5 70.3 72.5 78.4 60.6 78.1
##  [46] 72.6 68.2 67.3 75.2 71.1 62.3 71.8 75.7 74.3 66.9 63.3 68.4 86.3 62.3 65.8
##  [61] 72.0 69.4 69.0 80.1 65.7 69.9 71.0 66.0 62.8 71.3 79.1 72.0 70.5 70.4 68.7
##  [76] 75.9 67.5 63.2 62.1 70.4 75.3 71.2 68.4 72.3 70.9 74.1 76.6 65.2 58.3 68.2
##  [91] 64.9 74.2 66.4 60.6 67.1 62.5 78.0 72.4 73.2 72.7 74.2 75.7 67.3 62.9 65.8
## [106] 69.8 57.2 75.7 60.5 73.0 69.2 71.4 67.6 71.1 67.6 67.2 65.5 77.6 71.8 71.0
## [121] 65.6 68.8 65.1 67.5 64.0 77.9 66.2 65.6 63.4 66.1 67.5 69.8 66.8 62.0 62.4
## [136] 73.4 69.4 73.2 73.9 70.8 63.8 70.9 77.0 67.8 71.7 69.9 76.9 66.6 75.8 68.1
## [151] 66.5 70.5 71.7 79.6 71.6 71.0 78.0 73.0 66.8 76.8 66.3 66.8 65.8 65.3 71.7
## [166] 68.7 52.6 68.7 67.4 74.5 67.2 82.7 77.0 68.1 75.7 72.6 71.2 74.0 70.3 71.2
## [181] 74.1 80.3 71.3 67.7 59.9 79.8 68.7 72.7 79.5 71.1 69.0 77.1 72.3 72.9 62.4
## [196] 76.0 71.9 63.1 72.4 59.7

sd(pre_test_score_values)

## [1] 5.245385

If we don’t want to use the value later, we don’t have to save the result in a new R object.

sd(df$Pre_test_Score)

## [1] 5.245385

The computed value is shown in the output of Rmd document (or Console), but it is discarded immediately. Thus, we need to create a new R object to keep this value.

my_obj <- sd(pre_test_score_values)

In the Environment pane, we will see this new object. We can evaluate it in a code chunk.

my_obj

## [1] 5.245385

Create a new tibble object by selecting LMS_Engagement_Index, Pre_test_Score and Post_test_Score columns and evaluate this new tibble object,

To select columns in a tibble, we need to use secelect() function.

df |>
  select(LMS_Engagement_Index, Pre_test_Score, Post_test_Score)

Even though we see these three columns, they are not usable because they are discarded after they were printed.

In order to further use the outcome of running an R code, you need to save the outcome using an assignment operator. Before running this command, we already have df2 which contains 6 columns. Since we are using the same object name, df2, the existing df2 will be replaced with new result.

df2 <- df |>
  select(LMS_Engagement_Index, Pre_test_Score, Post_test_Score)

head(df2)

Now df2 has only three columns.

names(df)

## [1] "Student_ID"           "Gender"               "Access_Device"       
## [4] "Instruction_Mode"     "Pre_test_Score"       "Post_test_Score"     
## [7] "LMS_Engagement_Index"

names(df2)

## [1] "LMS_Engagement_Index" "Pre_test_Score"       "Post_test_Score"

Create another tibble object by selecting 10 rows (from 5th to 14th rows) from the tibble created in the step 2. Evaluate this new tibble to show its content.

Original data (df)

df

df3 <- df |>
  slice(5:14)

df3

Create a histogram showing Pre_test_Score values
- Use “Pre-test Score” for X axis label
- Use “Count” for Y axis label
- Use “white” for the fill color of histograms
- Use “black” for the line color of histogram.

We can use a template code on page 23 of the course note.

The previous code chunk failed because ’df” does not contain “hs_gpa” column because this is copied from course note.

We need to replace column names in the template with appropriate values found in our tibble object.

df |>
  ggplot(aes(x = Pre_test_Score)) +
  geom_histogram(color = "black", fill = "white") +
  labs(x = "Pre_test_Score", y = "Count")

## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.

The message indicates that in order to create a histogram, R created 30 bins.

Very common mistake…

Error message indicateds you put + on a new line.

Create an HTML report from the R Markdown document.

-When your Rmd file contains a code chunk that cannot be executed, you won’t get a rendered HTML report. - You need to make sure all code chunks in your Rmd are runnable ( in other words, no errors).

LTEC 6510: Hands-on Activity 1