Install R

https://cloud.r-project.org

Install RStudio

You should install R before installing RStudio.

https://posit.co/download/rstudio-desktop/

Using Posit Cloud

https://posit.cloud/

Create a class folder

RStudio panes

  1. Source (Top left)
  2. Console (Bottom left)
  3. Environment/History/Connections/Tutorial (Top right)
  4. Files/Plots/Packages/Help/Viewer/Presentation Bottom right)

R Markdown document

Install a package

Package id a collection of R commands that will allow you to analyze your data.

#install.packages("tidyverse")

Using a package

In order to use an R package, you need to load it into your R session.

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.1     ✔ tibble    3.3.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

This message indicates that tidyverse package loaded other packages such as dplyr, readr and etc.

Also, tidyverse_conflicts() part indicates that filter() and log() are defined in two packages (dplyr and stats), and (dplyr;;filter()andstats::log()`` will be used in your Rmd file.

Hands-on activity tasks

  1. Create a blank R Markdown document in RStudio.
    • Make sure to carefully follow all the steps described in the course note.
    • Make sure to delete placeholder text in the blank R Markdown document.
  2. Read in edtech_quantitative_data_v2.csv file.
df <- read_csv("edtech_quantitative_data_v2.csv")
## Rows: 200 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Gender, Access_Device, Instruction_Mode
## dbl (4): Student_ID, Pre_test_Score, Post_test_Score, LMS_Engagement_Index
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

The message indicates that read_csv() loaded a data file consisting of 7 columns (Student_ID, Gender, Access_Device, Instruction_Mode, Pre_test_Score, Post_test_Score, LMS_Engagement_Index).

  1. List the column names in the CSV file you loaded.
names(df)
## [1] "Student_ID"           "Gender"               "Access_Device"       
## [4] "Instruction_Mode"     "Pre_test_Score"       "Post_test_Score"     
## [7] "LMS_Engagement_Index"
head(df)
  1. Find the number of columns and rows in the loaded data set.
dim(df)
## [1] 200   7
  1. Create a new R object named my_obj to save the standard deviation value of Pre_test_Score column. Evaluate my_obj to show the computed value.
df$Pre_test_Score
##   [1] 73.5 73.0 67.6 59.5 67.9 70.7 70.0 73.8 79.7 72.1 72.8 70.7 75.9 72.7 73.7
##  [16] 79.0 71.1 68.6 72.2 75.7 72.5 75.3 66.7 65.7 67.3 63.7 70.6 67.8 76.2 63.5
##  [31] 77.8 65.1 63.8 65.9 79.8 69.8 72.6 69.1 69.5 67.5 70.3 72.5 78.4 60.6 78.1
##  [46] 72.6 68.2 67.3 75.2 71.1 62.3 71.8 75.7 74.3 66.9 63.3 68.4 86.3 62.3 65.8
##  [61] 72.0 69.4 69.0 80.1 65.7 69.9 71.0 66.0 62.8 71.3 79.1 72.0 70.5 70.4 68.7
##  [76] 75.9 67.5 63.2 62.1 70.4 75.3 71.2 68.4 72.3 70.9 74.1 76.6 65.2 58.3 68.2
##  [91] 64.9 74.2 66.4 60.6 67.1 62.5 78.0 72.4 73.2 72.7 74.2 75.7 67.3 62.9 65.8
## [106] 69.8 57.2 75.7 60.5 73.0 69.2 71.4 67.6 71.1 67.6 67.2 65.5 77.6 71.8 71.0
## [121] 65.6 68.8 65.1 67.5 64.0 77.9 66.2 65.6 63.4 66.1 67.5 69.8 66.8 62.0 62.4
## [136] 73.4 69.4 73.2 73.9 70.8 63.8 70.9 77.0 67.8 71.7 69.9 76.9 66.6 75.8 68.1
## [151] 66.5 70.5 71.7 79.6 71.6 71.0 78.0 73.0 66.8 76.8 66.3 66.8 65.8 65.3 71.7
## [166] 68.7 52.6 68.7 67.4 74.5 67.2 82.7 77.0 68.1 75.7 72.6 71.2 74.0 70.3 71.2
## [181] 74.1 80.3 71.3 67.7 59.9 79.8 68.7 72.7 79.5 71.1 69.0 77.1 72.3 72.9 62.4
## [196] 76.0 71.9 63.1 72.4 59.7

$ operator extracts a column from a tibble. The extracted column, which contains 200 numeric values, is printed in the output. However, the printed output is not available to us (unless you want to copy-and-paste printed values).

If you want to use the result, you need to save it in an R object using assignment operator.

pre_test_score_values <- df$Pre_test_Score

We can evaluate this new object, object_values by using its name in a code chunk.

pre_test_score_values
##   [1] 73.5 73.0 67.6 59.5 67.9 70.7 70.0 73.8 79.7 72.1 72.8 70.7 75.9 72.7 73.7
##  [16] 79.0 71.1 68.6 72.2 75.7 72.5 75.3 66.7 65.7 67.3 63.7 70.6 67.8 76.2 63.5
##  [31] 77.8 65.1 63.8 65.9 79.8 69.8 72.6 69.1 69.5 67.5 70.3 72.5 78.4 60.6 78.1
##  [46] 72.6 68.2 67.3 75.2 71.1 62.3 71.8 75.7 74.3 66.9 63.3 68.4 86.3 62.3 65.8
##  [61] 72.0 69.4 69.0 80.1 65.7 69.9 71.0 66.0 62.8 71.3 79.1 72.0 70.5 70.4 68.7
##  [76] 75.9 67.5 63.2 62.1 70.4 75.3 71.2 68.4 72.3 70.9 74.1 76.6 65.2 58.3 68.2
##  [91] 64.9 74.2 66.4 60.6 67.1 62.5 78.0 72.4 73.2 72.7 74.2 75.7 67.3 62.9 65.8
## [106] 69.8 57.2 75.7 60.5 73.0 69.2 71.4 67.6 71.1 67.6 67.2 65.5 77.6 71.8 71.0
## [121] 65.6 68.8 65.1 67.5 64.0 77.9 66.2 65.6 63.4 66.1 67.5 69.8 66.8 62.0 62.4
## [136] 73.4 69.4 73.2 73.9 70.8 63.8 70.9 77.0 67.8 71.7 69.9 76.9 66.6 75.8 68.1
## [151] 66.5 70.5 71.7 79.6 71.6 71.0 78.0 73.0 66.8 76.8 66.3 66.8 65.8 65.3 71.7
## [166] 68.7 52.6 68.7 67.4 74.5 67.2 82.7 77.0 68.1 75.7 72.6 71.2 74.0 70.3 71.2
## [181] 74.1 80.3 71.3 67.7 59.9 79.8 68.7 72.7 79.5 71.1 69.0 77.1 72.3 72.9 62.4
## [196] 76.0 71.9 63.1 72.4 59.7
sd(pre_test_score_values)
## [1] 5.245385

If we don’t want to use the value later, we don’t have to save the result in a new R object.

sd(df$Pre_test_Score)
## [1] 5.245385

The computed value is shown in the output of Rmd document (or Console), but it is discarded immediately. Thus, we need to create a new R object to keep this value.

my_obj <- sd(pre_test_score_values)

In the Environment pane, we will see this new object. We can evaluate it in a code chunk.

my_obj
## [1] 5.245385
  1. Create a new tibble object by selecting LMS_Engagement_Index, Pre_test_Score and Post_test_Score columns and evaluate this new tibble object,

To select columns in a tibble, we need to use secelect() function.

df |>
  select(LMS_Engagement_Index, Pre_test_Score, Post_test_Score)

Even though we see these three columns, they are not usable because they are discarded after they were printed.

In order to further use the outcome of running an R code, you need to save the outcome using an assignment operator. Before running this command, we already have df2 which contains 6 columns. Since we are using the same object name, df2, the existing df2 will be replaced with new result.

df2 <- df |>
  select(LMS_Engagement_Index, Pre_test_Score, Post_test_Score)
head(df2)

Now df2 has only three columns.

names(df)
## [1] "Student_ID"           "Gender"               "Access_Device"       
## [4] "Instruction_Mode"     "Pre_test_Score"       "Post_test_Score"     
## [7] "LMS_Engagement_Index"
names(df2)
## [1] "LMS_Engagement_Index" "Pre_test_Score"       "Post_test_Score"
  1. Create another tibble object by selecting 10 rows (from 5th to 14th rows) from the tibble created in the step 2. Evaluate this new tibble to show its content.

Original data (df)

df
df3 <- df |>
  slice(5:14)

df3
  1. Create a histogram showing Pre_test_Score values
    • Use “Pre-test Score” for X axis label
    • Use “Count” for Y axis label
    • Use “white” for the fill color of histograms
    • Use “black” for the line color of histogram.

We can use a template code on page 23 of the course note.

We need to replace column names in the template with appropriate values found in our tibble object.

df |>
  ggplot(aes(x = Pre_test_Score)) +
  geom_histogram(color = "black", fill = "white") +
  labs(x = "Pre_test_Score", y = "Count")
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.

The message indicates that in order to create a histogram, R created 30 bins.

Very common mistake…

Error message indicateds you put + on a new line.

  1. Create an HTML report from the R Markdown document.

-When your Rmd file contains a code chunk that cannot be executed, you won’t get a rendered HTML report. - You need to make sure all code chunks in your Rmd are runnable ( in other words, no errors).