R: Basic Commands

Published

September 6, 2025

Load packages

  • Install the tidyverse package (if you haven’t installed it yet)
  • Load the tidyverse package
# install.packages("tidyverse")  ## Uncomment this line if you haven't installed tidyverse yet
library(tidyverse)

Dataset

Let’s generate a dataset showing test scores of students. Start by creating some variables

name <- c("Bob", "Chris", "John", "Joanna")
score <- c(16, 20, 18, 19)
hair_color <- c("Blue", "Orange", "Green", "Blue")
gender <- c("M", "F", "F", "M")
age <- c(33, 45, 26, 4)   ## Lookit! Joanna is only four! A child prodigy!
  • name, hair_color and gender are character variables
  • score and age are numeric variables

Summary statistics

Let’s find some summary statistics for the score variable

mean(score)
[1] 18.25
sd(score)              ## standard deviation
[1] 1.707825
min(score)
[1] 16
max(score)
[1] 20
median(score)
[1] 18.5

Create dataframe

Let’s create a dataframe and store it in an object called df1

df1 <- data.frame(name, score, hair_color, gender, age)
df1
    name score hair_color gender age
1    Bob    16       Blue      M  33
2  Chris    20     Orange      F  45
3   John    18      Green      F  26
4 Joanna    19       Blue      M   4

Extract a column

Let’s extract one variable (column) from the dataframe and find its mean

df1$score
[1] 16 20 18 19
mean(df1$score)
[1] 18.25

Scatterplot

Let’s draw a scatterplot. We want to see if there is any relationship between age and score

ggplot(data = df1,
       mapping = aes(x = age, y = score)) + 
  geom_point()

EXERCISE

  1. Add 3 new students to the dataframe df1
  2. Obtain the summary statistics for the age variable
  3. Draw a scatterplot with age on the vertical axis and score on the horizontal axis.

SP | 01-R-basic-commands.qmd