# install.packages("tidyverse") ## Uncomment this line if you haven't installed tidyverse yet
library(tidyverse)
R: Basic Commands
Load packages
- Install the tidyverse package (if you haven’t installed it yet)
- Load the tidyverse package
Dataset
Let’s generate a dataset showing test scores of students. Start by creating some variables
<- c("Bob", "Chris", "John", "Joanna")
name <- c(16, 20, 18, 19)
score <- c("Blue", "Orange", "Green", "Blue")
hair_color <- c("M", "F", "F", "M")
gender <- c(33, 45, 26, 4) ## Lookit! Joanna is only four! A child prodigy! age
- name, hair_color and gender are character variables
- score and age are numeric variables
Summary statistics
Let’s find some summary statistics for the score variable
mean(score)
[1] 18.25
sd(score) ## standard deviation
[1] 1.707825
min(score)
[1] 16
max(score)
[1] 20
median(score)
[1] 18.5
Create dataframe
Let’s create a dataframe and store it in an object called df1
<- data.frame(name, score, hair_color, gender, age)
df1 df1
name score hair_color gender age
1 Bob 16 Blue M 33
2 Chris 20 Orange F 45
3 John 18 Green F 26
4 Joanna 19 Blue M 4
Extract a column
Let’s extract one variable (column) from the dataframe and find its mean
$score df1
[1] 16 20 18 19
mean(df1$score)
[1] 18.25
Scatterplot
Let’s draw a scatterplot. We want to see if there is any relationship between age and score
ggplot(data = df1,
mapping = aes(x = age, y = score)) +
geom_point()
EXERCISE
- Add 3 new students to the dataframe df1
- Obtain the summary statistics for the age variable
- Draw a scatterplot with age on the vertical axis and score on the horizontal axis.
SP | 01-R-basic-commands.qmd