You should install R before installing RStudio.
Package id a collection of R commands that will allow you to analyze your data.
#install.packages("tidyverse")
Even though the message is shown in red, it is NOT an error.
The message shows where the package is downloaded from, and where the downloaded package is installed on your computer.
You can also install a package using Packages pane.
In order to use an R package, you need to load it into your R session.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.6
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.1 ✔ tibble 3.3.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.2
## ✔ purrr 1.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
This message indicates that tidyverse package loaded
other packages such as dplyr, readr and
etc.
Also, tidyverse_conflicts() part indicates that
filter() and log() are defined in two packages
(dplyr and stats), and
(dplyr;;filter()andstats::log()`` will be used in your Rmd
file.
edtech_quantitative_data_v2.csv file.df <- read_csv("edtech_quantitative_data_v2.csv")
## Rows: 200 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Gender, Access_Device, Instruction_Mode
## dbl (4): Student_ID, Pre_test_Score, Post_test_Score, LMS_Engagement_Index
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
The message indicates that read_csv() loaded a data file
consisting of 7 columns (Student_ID, Gender, Access_Device,
Instruction_Mode, Pre_test_Score, Post_test_Score,
LMS_Engagement_Index).
names(df)
## [1] "Student_ID" "Gender" "Access_Device"
## [4] "Instruction_Mode" "Pre_test_Score" "Post_test_Score"
## [7] "LMS_Engagement_Index"
head(df)
dim(df)
## [1] 200 7
my_obj to save the standard
deviation value of Pre_test_Score column. Evaluate
my_obj to show the computed value.sd() function.df, which we call tibble).df$Pre_test_Score
## [1] 73.5 73.0 67.6 59.5 67.9 70.7 70.0 73.8 79.7 72.1 72.8 70.7 75.9 72.7 73.7
## [16] 79.0 71.1 68.6 72.2 75.7 72.5 75.3 66.7 65.7 67.3 63.7 70.6 67.8 76.2 63.5
## [31] 77.8 65.1 63.8 65.9 79.8 69.8 72.6 69.1 69.5 67.5 70.3 72.5 78.4 60.6 78.1
## [46] 72.6 68.2 67.3 75.2 71.1 62.3 71.8 75.7 74.3 66.9 63.3 68.4 86.3 62.3 65.8
## [61] 72.0 69.4 69.0 80.1 65.7 69.9 71.0 66.0 62.8 71.3 79.1 72.0 70.5 70.4 68.7
## [76] 75.9 67.5 63.2 62.1 70.4 75.3 71.2 68.4 72.3 70.9 74.1 76.6 65.2 58.3 68.2
## [91] 64.9 74.2 66.4 60.6 67.1 62.5 78.0 72.4 73.2 72.7 74.2 75.7 67.3 62.9 65.8
## [106] 69.8 57.2 75.7 60.5 73.0 69.2 71.4 67.6 71.1 67.6 67.2 65.5 77.6 71.8 71.0
## [121] 65.6 68.8 65.1 67.5 64.0 77.9 66.2 65.6 63.4 66.1 67.5 69.8 66.8 62.0 62.4
## [136] 73.4 69.4 73.2 73.9 70.8 63.8 70.9 77.0 67.8 71.7 69.9 76.9 66.6 75.8 68.1
## [151] 66.5 70.5 71.7 79.6 71.6 71.0 78.0 73.0 66.8 76.8 66.3 66.8 65.8 65.3 71.7
## [166] 68.7 52.6 68.7 67.4 74.5 67.2 82.7 77.0 68.1 75.7 72.6 71.2 74.0 70.3 71.2
## [181] 74.1 80.3 71.3 67.7 59.9 79.8 68.7 72.7 79.5 71.1 69.0 77.1 72.3 72.9 62.4
## [196] 76.0 71.9 63.1 72.4 59.7
$ operator extracts a column from a tibble. The
extracted column, which contains 200 numeric values, is printed in the
output. However, the printed output is not available to us (unless you
want to copy-and-paste printed values).
If you want to use the result, you need to save it in an R object using assignment operator.
pre_test_score_values <- df$Pre_test_Score
We can evaluate this new object, object_values by using
its name in a code chunk.
pre_test_score_values
## [1] 73.5 73.0 67.6 59.5 67.9 70.7 70.0 73.8 79.7 72.1 72.8 70.7 75.9 72.7 73.7
## [16] 79.0 71.1 68.6 72.2 75.7 72.5 75.3 66.7 65.7 67.3 63.7 70.6 67.8 76.2 63.5
## [31] 77.8 65.1 63.8 65.9 79.8 69.8 72.6 69.1 69.5 67.5 70.3 72.5 78.4 60.6 78.1
## [46] 72.6 68.2 67.3 75.2 71.1 62.3 71.8 75.7 74.3 66.9 63.3 68.4 86.3 62.3 65.8
## [61] 72.0 69.4 69.0 80.1 65.7 69.9 71.0 66.0 62.8 71.3 79.1 72.0 70.5 70.4 68.7
## [76] 75.9 67.5 63.2 62.1 70.4 75.3 71.2 68.4 72.3 70.9 74.1 76.6 65.2 58.3 68.2
## [91] 64.9 74.2 66.4 60.6 67.1 62.5 78.0 72.4 73.2 72.7 74.2 75.7 67.3 62.9 65.8
## [106] 69.8 57.2 75.7 60.5 73.0 69.2 71.4 67.6 71.1 67.6 67.2 65.5 77.6 71.8 71.0
## [121] 65.6 68.8 65.1 67.5 64.0 77.9 66.2 65.6 63.4 66.1 67.5 69.8 66.8 62.0 62.4
## [136] 73.4 69.4 73.2 73.9 70.8 63.8 70.9 77.0 67.8 71.7 69.9 76.9 66.6 75.8 68.1
## [151] 66.5 70.5 71.7 79.6 71.6 71.0 78.0 73.0 66.8 76.8 66.3 66.8 65.8 65.3 71.7
## [166] 68.7 52.6 68.7 67.4 74.5 67.2 82.7 77.0 68.1 75.7 72.6 71.2 74.0 70.3 71.2
## [181] 74.1 80.3 71.3 67.7 59.9 79.8 68.7 72.7 79.5 71.1 69.0 77.1 72.3 72.9 62.4
## [196] 76.0 71.9 63.1 72.4 59.7
sd(pre_test_score_values)
## [1] 5.245385
If we don’t want to use the value later, we don’t have to save the result in a new R object.
sd(df$Pre_test_Score)
## [1] 5.245385
The computed value is shown in the output of Rmd document (or Console), but it is discarded immediately. Thus, we need to create a new R object to keep this value.
my_obj <- sd(pre_test_score_values)
In the Environment pane, we will see this new object. We can evaluate it in a code chunk.
my_obj
## [1] 5.245385
LMS_Engagement_Index, Pre_test_Score and
Post_test_Score columns and evaluate this new tibble
object,To select columns in a tibble, we need to use secelect()
function.
df |>
select(LMS_Engagement_Index, Pre_test_Score, Post_test_Score)
Even though we see these three columns, they are not usable because they are discarded after they were printed.
In order to further use the outcome of running an R code, you need to
save the outcome using an assignment operator. Before running this
command, we already have df2 which contains 6 columns.
Since we are using the same object name, df2, the existing
df2 will be replaced with new result.
df2 <- df |>
select(LMS_Engagement_Index, Pre_test_Score, Post_test_Score)
head(df2)
Now df2 has only three columns.
names(df)
## [1] "Student_ID" "Gender" "Access_Device"
## [4] "Instruction_Mode" "Pre_test_Score" "Post_test_Score"
## [7] "LMS_Engagement_Index"
names(df2)
## [1] "LMS_Engagement_Index" "Pre_test_Score" "Post_test_Score"
Original data (df)
df
df3 <- df |>
slice(5:14)
df3
Pre_test_Score values
We can use a template code on page 23 of the course note.
We need to replace column names in the template with appropriate values found in our tibble object.
df |>
ggplot(aes(x = Pre_test_Score)) +
geom_histogram(color = "black", fill = "white") +
labs(x = "Pre_test_Score", y = "Count")
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.
The message indicates that in order to create a histogram, R created 30 bins.
Very common mistake…
Error message indicateds you put + on a new line.
-When your Rmd file contains a code chunk that cannot be executed, you won’t get a rendered HTML report. - You need to make sure all code chunks in your Rmd are runnable ( in other words, no errors).