# install.packages("tidyverse") ## Uncomment this line if you haven't installed tidyverse yet
library(tidyverse)R: Basic Commands
Load packages
- Install the tidyverse package (if you haven’t installed it yet)
- Load the tidyverse package
Dataset
Let’s generate a dataset showing test scores of students. Start by creating some variables
name <- c("Bob", "Chris", "John", "Joanna")
score <- c(16, 20, 18, 19)
hair_color <- c("Blue", "Orange", "Green", "Blue")
gender <- c("M", "F", "F", "M")
age <- c(33, 45, 26, 4) ## Lookit! Joanna is only four! A child prodigy!- name, hair_color and gender are character variables
- score and age are numeric variables
Summary statistics
Let’s find some summary statistics for the score variable
mean(score)[1] 18.25
sd(score) ## standard deviation[1] 1.707825
min(score)[1] 16
max(score)[1] 20
median(score)[1] 18.5
Create dataframe
Let’s create a dataframe and store it in an object called df1
df1 <- data.frame(name, score, hair_color, gender, age)
df1 name score hair_color gender age
1 Bob 16 Blue M 33
2 Chris 20 Orange F 45
3 John 18 Green F 26
4 Joanna 19 Blue M 4
Extract a column
Let’s extract one variable (column) from the dataframe and find its mean
df1$score[1] 16 20 18 19
mean(df1$score)[1] 18.25
Scatterplot
Let’s draw a scatterplot. We want to see if there is any relationship between age and score
ggplot(data = df1,
mapping = aes(x = age, y = score)) +
geom_point()EXERCISE
- Add 3 new students to the dataframe df1
- Obtain the summary statistics for the age variable
- Draw a scatterplot with age on the vertical axis and score on the horizontal axis.
SP | 01-R-basic-commands.qmd