The first step is to open R studio and click on the + bottom, on the upper left corner.
Select R Markdown:
Now you are ready to start working!
Working in R markdown makes coding so much easier! What you need to know is quite simple:
# This is a chunk
Headlines
You can exploit R makrdown to create nice summaries of your analyses. You can also use html coding to set up headlines, change the font etc.
Begin a headline with a single pound symbol (#) followed by a space, then write your headline text.
For subheadlines, use multiple pound symbols (##, ###, ####, etc.) to indicate the hierarchy level.
Each additional pound symbol represents a deeper level of hierarchy.
Ensure there is a space between the pound symbols and the text of the headline or subheadline.
Changing Font and Face
Use asterisks (*) or underscores (_) to emphasize text.
Add an * or _ before and after the word/sentence. Make sure there is no space. Examples: italics or italics
Add two ** or __ before and after the word/sentence. Make sure there is no space. Examples: bold or bold
Add three *** or __ before and after the word/sentence. Make sure there is no space. Examples: bold and italics or bold and italics for bold and italics.
Additionally, you can use backticks to indicate code or monospace
font (e.g., code)
You can also freely decide which chunks should be printed in the knitted document, which should appear but not run, etc.
Common chunk options include:
echo = FALSE to suppress the display of code within
the document but still show its output.
eval = FALSE to prevent code execution.
include = FALSE to hide both code and
output.
message = FALSE to suppress messages generated by
code execution.
warning = FALSE to suppress warnings generated by
code execution.
### Install packages
# install.packages("ggplot2")
# install.packages("ggridges")
# install.packages("tidyverse")
# install.packages("janitor")
# install.packages("kableExtra")
# install.packages("unikn")
# install.packages("ggpubr")
# install.packages("sjPlot")
# install.packages("lme4")
# install.packages("lmerTest")
# install.packages("car")
# install.packages("readxl")
### Load libraries
library(tidyverse)
library(ggplot2)
library(ggridges)
library(janitor) # helpful to clean col names `clean_names()`
library(kableExtra) # to display and edit tables
library(unikn) # for uni Konstanz theme
library(ggpubr) # to arrange plots
library(sjPlot) # to visualize model estimates/predicted values
library(lme4)
library(lmerTest)
library(car)
library(readxl) # read excel data
# set options for tables
bs_style <- c("striped", "hover", "condensed", "responsive")
options(kable_styling_bootstrap_options = bs_style)
It’s important that you save your .Rmd file in the same folder where your data are stored. This would make loading data much easier. You also won’t need to set any directory.
Use the function read_csv("name_data") from the
package tidyverse to load your data
If your data are in .xlsx, you can use the function
read_excel() from the package readxl
### load data
read_csv("complete.csv") -> df
Structure of the dataset:
ID: ID participants
age
L1_code: Participants’ native language
education: Participants’ education level
name_school: Name of the school where participants
were tested
class: School grade (1 to 5 + uni)
group: DYS = dyslexics; TD = typically developing
participants
other_diagnoses: Other diagnoses (beyond dyslexia
for DYS)
n_other_diagnoses: Number of other diagnoses (beyond
dyslexia for DYS)
age_diagnosis: Age of dyslexia diagnosis
AoO: Age of Onset for English (L2)
wr_time_z: Word reading time (z score)
wr_error_z: Word reading errors (z score)
wr_syl_z: Syllable per second in word reading (z
score)
nwr_time_z: Nonword reading time (z score)
nwr_error_z: Nonword reading errors (z
score)
nwr_syl_z: Syllable per second in nonword reading (z
score)
group.exclusion: Another categorization of group
(ignore this)
pa.rt, pa.acc, pa.bis:
Phonological awareness measurues (RT, accuracy and speed-accuracy trade
off)
forward and backward: memory
tests
lexita.acc and lexita.rt,
lexita.bis: Italian vocabulary (RT, accuracy and
speed-accuracy trade off)
lextale.acc and lextale.rt,
lextale.bis: Italian vocabulary (RT, accuracy and
speed-accuracy trade off)
it.ok.acc: Accuracy in Italian orthographic
knowledge test
en.ok.acc, en.ok.rt,
en.ok.bis: English orthographic knowledge (RT, accuracy and
speed-accuracy trade off)
va.span.acc, va.span.d,
va.span.c: Visual attention span measures
eng.prof: Self-assessed English proficiency
eng.use: Self-assessed English use
eng.read: Self-assessed English reading
exposure
it.read: Self-assessed Italian reading
exposure
selectSelect (and optionally rename) variables in a data frame, using a concise mini-language that makes it easy to refer to variables based on their name (e.g. a:f selects all columns from a on the left to f on the right) or type (e.g. where(is.numeric) selects all numeric columns).
Tidyverse selections implement a dialect of R where operators make it easy to select variables:
: for selecting a range of consecutive variables.
! for taking the complement of a set of variables.
& and | for selecting the intersection or the union of two sets of variables.
c() for combining selections.
Examples: Let’s say we want to select the first three rows in our dataset
df %>% dplyr::select(1:3)
## # A tibble: 94 × 3
## ID age L1_code
## <chr> <dbl> <chr>
## 1 LAUVEN 20.3 IT
## 2 LEVEN 20.5 IT
## 3 LAE02 14.2 IT
## 4 LAE03 14.6 IT
## 5 LAE04 18.7 IT
## 6 LAE05 17.4 IT and ROM (HL)
## 7 LAE06 14.3 IT
## 8 LAE07 14.9 IT
## 9 LAE08 14.1 IT
## 10 LAE09 17.1 IT
## # ℹ 84 more rows
Here, we want to select the columns ID,
age, and the reading measures: wr_time_z,
wr_error_z, nwr_time_z,
nwr_error_z
### Option 1: type all cols names
df %>% dplyr::select(ID, age, wr_time_z, wr_error_z, nwr_time_z, nwr_error_z)
## # A tibble: 94 × 6
## ID age wr_time_z wr_error_z nwr_time_z nwr_error_z
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 LAUVEN 20.3 -0.58 -1.23 -1.05 -1.45
## 2 LEVEN 20.5 -0.23 0.1 -1.05 0.43
## 3 LAE02 14.2 -0.97 0.59 -0.34 0.99
## 4 LAE03 14.6 1.65 0.92 0.63 0.18
## 5 LAE04 18.7 -4.58 1.04 -0.99 -0.43
## 6 LAE05 17.4 -1.42 0.45 -0.03 -1.23
## 7 LAE06 14.3 0.42 0.59 0.04 0.72
## 8 LAE07 14.9 -1.4 -0.39 -1.17 0.45
## 9 LAE08 14.1 -0.33 -0.39 -0.44 -0.36
## 10 LAE09 17.1 0.32 1.04 0.46 1.15
## # ℹ 84 more rows
### Option 2: Since reading measures are consecutive, we can also write the following:
df %>% dplyr::select(ID, age, wr_time_z:wr_error_z)
## # A tibble: 94 × 4
## ID age wr_time_z wr_error_z
## <chr> <dbl> <dbl> <dbl>
## 1 LAUVEN 20.3 -0.58 -1.23
## 2 LEVEN 20.5 -0.23 0.1
## 3 LAE02 14.2 -0.97 0.59
## 4 LAE03 14.6 1.65 0.92
## 5 LAE04 18.7 -4.58 1.04
## 6 LAE05 17.4 -1.42 0.45
## 7 LAE06 14.3 0.42 0.59
## 8 LAE07 14.9 -1.4 -0.39
## 9 LAE08 14.1 -0.33 -0.39
## 10 LAE09 17.1 0.32 1.04
## # ℹ 84 more rows
In the following example we want to select all the numeric columns (1) and all columns that are characters
df %>% dplyr::select(where(is.numeric))
## # A tibble: 94 × 29
## age n_other_diagnoses AoO wr_time_z wr_error_z nwr_time_z nwr_error_z
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 20.3 1 8 -0.58 -1.23 -1.05 -1.45
## 2 20.5 NA 6 -0.23 0.1 -1.05 0.43
## 3 14.2 NA 6 -0.97 0.59 -0.34 0.99
## 4 14.6 NA 5 1.65 0.92 0.63 0.18
## 5 18.7 NA 4 -4.58 1.04 -0.99 -0.43
## 6 17.4 NA 5 -1.42 0.45 -0.03 -1.23
## 7 14.3 NA 5 0.42 0.59 0.04 0.72
## 8 14.9 NA 6 -1.4 -0.39 -1.17 0.45
## 9 14.1 NA 6 -0.33 -0.39 -0.44 -0.36
## 10 17.1 NA 6 0.32 1.04 0.46 1.15
## # ℹ 84 more rows
## # ℹ 22 more variables: pa.rt <dbl>, pa.acc <dbl>, pa.bis <dbl>, forward <dbl>,
## # backward <dbl>, lexita.acc <dbl>, lexita.rt <dbl>, lexita.bis <dbl>,
## # lextale.acc <dbl>, lextale.rt <dbl>, lextale.bis <dbl>, it.ok.acc <dbl>,
## # en.ok.acc <dbl>, en.ok.rt <dbl>, en.ok.bis <dbl>, va.span.acc <dbl>,
## # va.span.d <dbl>, va.span.c <dbl>, eng.prof <dbl>, eng.use <dbl>,
## # eng.read <dbl>, it.read <dbl>
df %>% dplyr::select(where(is.character))
## # A tibble: 94 × 9
## ID L1_code education name_school class group other_diagnoses age_diagnosis
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 LAUV… IT BA university uni DYS disortografia 11
## 2 LEVEN IT BA university uni TD <NA> -
## 3 LAE02 IT HS LAE 1 TD <NA> -
## 4 LAE03 IT HS LAE 1 TD <NA> -
## 5 LAE04 IT HS LAE 4 DYS <NA> 17
## 6 LAE05 IT and… HS LAE 4 TD <NA> -
## 7 LAE06 IT HS LAE 1 TD <NA> -
## 8 LAE07 IT HS LAE 1 TD <NA> -
## 9 LAE08 IT HS LAE 1 TD <NA> -
## 10 LAE09 IT HS LAE 4 TD <NA> -
## # ℹ 84 more rows
## # ℹ 1 more variable: group.exclusion <chr>
For more information, run the following code, or type
select in the help section and select
dplyr::select
??dplyr::select
filterThe filter() function is used to subset a data frame, retaining all rows that satisfy your conditions. To be retained, the row must produce a value of TRUE for all conditions.
Filter operators
There are many functions and operators that are useful when constructing the expressions used to filter the data:
== : Values equal to
> and >= : Values bigger than / bigger, equal to
< and <= : Values smaller than / smaller, equal to
& = and
| = or
! : Different from
is.na() : is NA
!is.na() : is not NA
%in% : values in the specified c()
Examples
group == “DYS”df %>% filter(group == "DYS")
## # A tibble: 32 × 38
## ID age L1_code education name_school class group other_diagnoses
## <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 LAUVEN 20.3 IT BA university uni DYS disortografia
## 2 LAE04 18.7 IT HS LAE 4 DYS <NA>
## 3 LAE16 16.6 IT HS LAE 3 DYS <NA>
## 4 LAE17 17.1 IT HS LAE 3 DYS <NA>
## 5 LAE45 19 IT HS LAE 5 DYS discalculia
## 6 LAE46 18.5 IT HS LAE 5 DYS disgrafia
## 7 LAE47 18.1 IT HS LAE 5 DYS discalculia
## 8 LC03 16.4 IT HS LC 3 DYS <NA>
## 9 LC06 19.2 IT HS LC 5 DYS disgrafia
## 10 LC11 16.6 IT HS LC 3 DYS disgrafia
## # ℹ 22 more rows
## # ℹ 30 more variables: n_other_diagnoses <dbl>, age_diagnosis <chr>, AoO <dbl>,
## # wr_time_z <dbl>, wr_error_z <dbl>, nwr_time_z <dbl>, nwr_error_z <dbl>,
## # group.exclusion <chr>, pa.rt <dbl>, pa.acc <dbl>, pa.bis <dbl>,
## # forward <dbl>, backward <dbl>, lexita.acc <dbl>, lexita.rt <dbl>,
## # lexita.bis <dbl>, lextale.acc <dbl>, lextale.rt <dbl>, lextale.bis <dbl>,
## # it.ok.acc <dbl>, en.ok.acc <dbl>, en.ok.rt <dbl>, en.ok.bis <dbl>, …
group == “TD” AND
n_other_diagnosis is NAdf %>% filter(group == "TD" & is.na(n_other_diagnoses))
## # A tibble: 59 × 38
## ID age L1_code education name_school class group other_diagnoses
## <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 LEVEN 20.5 IT BA university uni TD <NA>
## 2 LAE02 14.2 IT HS LAE 1 TD <NA>
## 3 LAE03 14.6 IT HS LAE 1 TD <NA>
## 4 LAE05 17.4 IT and ROM (HL) HS LAE 4 TD <NA>
## 5 LAE06 14.3 IT HS LAE 1 TD <NA>
## 6 LAE07 14.9 IT HS LAE 1 TD <NA>
## 7 LAE08 14.1 IT HS LAE 1 TD <NA>
## 8 LAE09 17.1 IT HS LAE 4 TD <NA>
## 9 LAE10 17.7 IT HS LAE 4 TD <NA>
## 10 LAE11 15.4 IT HS LAE 2 TD <NA>
## # ℹ 49 more rows
## # ℹ 30 more variables: n_other_diagnoses <dbl>, age_diagnosis <chr>, AoO <dbl>,
## # wr_time_z <dbl>, wr_error_z <dbl>, nwr_time_z <dbl>, nwr_error_z <dbl>,
## # group.exclusion <chr>, pa.rt <dbl>, pa.acc <dbl>, pa.bis <dbl>,
## # forward <dbl>, backward <dbl>, lexita.acc <dbl>, lexita.rt <dbl>,
## # lexita.bis <dbl>, lextale.acc <dbl>, lextale.rt <dbl>, lextale.bis <dbl>,
## # it.ok.acc <dbl>, en.ok.acc <dbl>, en.ok.rt <dbl>, en.ok.bis <dbl>, …
name_school is “LAE” OR “LC”### Option 1
df %>% filter(name_school == "LAE" | name_school == "LC")
## # A tibble: 56 × 38
## ID age L1_code education name_school class group other_diagnoses
## <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 LAE02 14.2 IT HS LAE 1 TD <NA>
## 2 LAE03 14.6 IT HS LAE 1 TD <NA>
## 3 LAE04 18.7 IT HS LAE 4 DYS <NA>
## 4 LAE05 17.4 IT and ROM (HL) HS LAE 4 TD <NA>
## 5 LAE06 14.3 IT HS LAE 1 TD <NA>
## 6 LAE07 14.9 IT HS LAE 1 TD <NA>
## 7 LAE08 14.1 IT HS LAE 1 TD <NA>
## 8 LAE09 17.1 IT HS LAE 4 TD <NA>
## 9 LAE10 17.7 IT HS LAE 4 TD <NA>
## 10 LAE11 15.4 IT HS LAE 2 TD <NA>
## # ℹ 46 more rows
## # ℹ 30 more variables: n_other_diagnoses <dbl>, age_diagnosis <chr>, AoO <dbl>,
## # wr_time_z <dbl>, wr_error_z <dbl>, nwr_time_z <dbl>, nwr_error_z <dbl>,
## # group.exclusion <chr>, pa.rt <dbl>, pa.acc <dbl>, pa.bis <dbl>,
## # forward <dbl>, backward <dbl>, lexita.acc <dbl>, lexita.rt <dbl>,
## # lexita.bis <dbl>, lextale.acc <dbl>, lextale.rt <dbl>, lextale.bis <dbl>,
## # it.ok.acc <dbl>, en.ok.acc <dbl>, en.ok.rt <dbl>, en.ok.bis <dbl>, …
### Option 2
df %>% filter(name_school %in% c("LAE", "LC") )
## # A tibble: 56 × 38
## ID age L1_code education name_school class group other_diagnoses
## <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 LAE02 14.2 IT HS LAE 1 TD <NA>
## 2 LAE03 14.6 IT HS LAE 1 TD <NA>
## 3 LAE04 18.7 IT HS LAE 4 DYS <NA>
## 4 LAE05 17.4 IT and ROM (HL) HS LAE 4 TD <NA>
## 5 LAE06 14.3 IT HS LAE 1 TD <NA>
## 6 LAE07 14.9 IT HS LAE 1 TD <NA>
## 7 LAE08 14.1 IT HS LAE 1 TD <NA>
## 8 LAE09 17.1 IT HS LAE 4 TD <NA>
## 9 LAE10 17.7 IT HS LAE 4 TD <NA>
## 10 LAE11 15.4 IT HS LAE 2 TD <NA>
## # ℹ 46 more rows
## # ℹ 30 more variables: n_other_diagnoses <dbl>, age_diagnosis <chr>, AoO <dbl>,
## # wr_time_z <dbl>, wr_error_z <dbl>, nwr_time_z <dbl>, nwr_error_z <dbl>,
## # group.exclusion <chr>, pa.rt <dbl>, pa.acc <dbl>, pa.bis <dbl>,
## # forward <dbl>, backward <dbl>, lexita.acc <dbl>, lexita.rt <dbl>,
## # lexita.bis <dbl>, lextale.acc <dbl>, lextale.rt <dbl>, lextale.bis <dbl>,
## # it.ok.acc <dbl>, en.ok.acc <dbl>, en.ok.rt <dbl>, en.ok.bis <dbl>, …
Run the following code for more information about
filter, or type filter in the help section
??dplyr::filter
mutatemutate() creates new columns that are functions of
existing variables. It can also modify (if the name is the same as an
existing column) and delete columns (by setting their value to
NULL).
Examples:
age, which is currently ### Option 1
df %>%
mutate(age = as.double(age))
## # A tibble: 94 × 38
## ID age L1_code education name_school class group other_diagnoses
## <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 LAUVEN 20.3 IT BA university uni DYS disortografia
## 2 LEVEN 20.5 IT BA university uni TD <NA>
## 3 LAE02 14.2 IT HS LAE 1 TD <NA>
## 4 LAE03 14.6 IT HS LAE 1 TD <NA>
## 5 LAE04 18.7 IT HS LAE 4 DYS <NA>
## 6 LAE05 17.4 IT and ROM (H… HS LAE 4 TD <NA>
## 7 LAE06 14.3 IT HS LAE 1 TD <NA>
## 8 LAE07 14.9 IT HS LAE 1 TD <NA>
## 9 LAE08 14.1 IT HS LAE 1 TD <NA>
## 10 LAE09 17.1 IT HS LAE 4 TD <NA>
## # ℹ 84 more rows
## # ℹ 30 more variables: n_other_diagnoses <dbl>, age_diagnosis <chr>, AoO <dbl>,
## # wr_time_z <dbl>, wr_error_z <dbl>, nwr_time_z <dbl>, nwr_error_z <dbl>,
## # group.exclusion <chr>, pa.rt <dbl>, pa.acc <dbl>, pa.bis <dbl>,
## # forward <dbl>, backward <dbl>, lexita.acc <dbl>, lexita.rt <dbl>,
## # lexita.bis <dbl>, lextale.acc <dbl>, lextale.rt <dbl>, lextale.bis <dbl>,
## # it.ok.acc <dbl>, en.ok.acc <dbl>, en.ok.rt <dbl>, en.ok.bis <dbl>, …
### Option 2
df %>%
mutate(age = as.numeric(age))
## # A tibble: 94 × 38
## ID age L1_code education name_school class group other_diagnoses
## <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 LAUVEN 20.3 IT BA university uni DYS disortografia
## 2 LEVEN 20.5 IT BA university uni TD <NA>
## 3 LAE02 14.2 IT HS LAE 1 TD <NA>
## 4 LAE03 14.6 IT HS LAE 1 TD <NA>
## 5 LAE04 18.7 IT HS LAE 4 DYS <NA>
## 6 LAE05 17.4 IT and ROM (H… HS LAE 4 TD <NA>
## 7 LAE06 14.3 IT HS LAE 1 TD <NA>
## 8 LAE07 14.9 IT HS LAE 1 TD <NA>
## 9 LAE08 14.1 IT HS LAE 1 TD <NA>
## 10 LAE09 17.1 IT HS LAE 4 TD <NA>
## # ℹ 84 more rows
## # ℹ 30 more variables: n_other_diagnoses <dbl>, age_diagnosis <chr>, AoO <dbl>,
## # wr_time_z <dbl>, wr_error_z <dbl>, nwr_time_z <dbl>, nwr_error_z <dbl>,
## # group.exclusion <chr>, pa.rt <dbl>, pa.acc <dbl>, pa.bis <dbl>,
## # forward <dbl>, backward <dbl>, lexita.acc <dbl>, lexita.rt <dbl>,
## # lexita.bis <dbl>, lextale.acc <dbl>, lextale.rt <dbl>, lextale.bis <dbl>,
## # it.ok.acc <dbl>, en.ok.acc <dbl>, en.ok.rt <dbl>, en.ok.bis <dbl>, …
school_type),
which follows these conditions: If name_school is LAE, then
school_type is “Linguistic”, otherwise it’s “Other”df %>%
mutate(school_type = if_else(name_school == "LAE", "Linguistic", "Other")) %>%
### Check the result
filter(ID %in% c("LAE03", "LC01", "MEN01", "VER01", "LEVEN")) %>%
dplyr::select(name_school, school_type)
## # A tibble: 5 × 2
## name_school school_type
## <chr> <chr>
## 1 university Other
## 2 LAE Linguistic
## 3 LC Other
## 4 MEN Other
## 5 VER Other
The if_else function works as follows:
if_else(CONDITION,
VALUE IF CONDITION IS TRUE,
VALUE IF CONDITION IS FALSE)
It is also possible to use case_when when we have
multiple conditions. Here, we have 5 levels under the
name_school column: “LAE”, “LC”, “MEN”, “university”.
case_when works as follows:
case_when(
CONDITION ~ VALUE,
CONDITION ~ VALUE,
...
)
Let’s see a concrete example:
If name_school is LAE = Linguistic, if
name_school is LC = Scientific, if name_school
is MEN = agrarian, if name_school is VER = university, if
name_school is university, do not change it:
df %>%
mutate(school_type = case_when(
name_school == "LAE" ~ "Linguistic",
name_school == "LC" ~ "Scientific",
name_school == "MEN" ~ "Agrarian",
name_school == "VER" ~ "university",
TRUE ~ name_school
)) %>%
### check
filter(ID %in% c("LAE03", "LC01", "MEN01", "VER01", "LEVEN")) %>%
dplyr::select(name_school, school_type)
## # A tibble: 5 × 2
## name_school school_type
## <chr> <chr>
## 1 university university
## 2 LAE Linguistic
## 3 LC Scientific
## 4 MEN Agrarian
## 5 VER university
mutate_allYou can use mutate_all to mutate all columns in a
dataset. For instance, let’s select the four reading columns
(wr_time_z, wr_error_z,
nwr_time_z, nwr_error_z) and
scale all columns
df %>%
dplyr::select(wr_time_z:nwr_error_z) %>%
mutate_all(~scale(.x))
## # A tibble: 94 × 4
## wr_time_z[,1] wr_error_z[,1] nwr_time_z[,1] nwr_error_z[,1]
## <dbl> <dbl> <dbl> <dbl>
## 1 0.546 -0.115 0.156 -0.105
## 2 0.680 0.527 0.156 0.683
## 3 0.396 0.763 0.420 0.918
## 4 1.40 0.923 0.781 0.578
## 5 -0.994 0.981 0.178 0.322
## 6 0.222 0.696 0.535 -0.0131
## 7 0.931 0.763 0.561 0.805
## 8 0.230 0.290 0.112 0.691
## 9 0.642 0.290 0.383 0.352
## 10 0.892 0.981 0.717 0.985
## # ℹ 84 more rows
mutate_ifmutate_if is used to mutate all variables which meet a
specific condition. For instance, let’s mutate all variables that are
character into factors:
### Initial df
df %>% select(where(is.character))
## # A tibble: 94 × 9
## ID L1_code education name_school class group other_diagnoses age_diagnosis
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 LAUV… IT BA university uni DYS disortografia 11
## 2 LEVEN IT BA university uni TD <NA> -
## 3 LAE02 IT HS LAE 1 TD <NA> -
## 4 LAE03 IT HS LAE 1 TD <NA> -
## 5 LAE04 IT HS LAE 4 DYS <NA> 17
## 6 LAE05 IT and… HS LAE 4 TD <NA> -
## 7 LAE06 IT HS LAE 1 TD <NA> -
## 8 LAE07 IT HS LAE 1 TD <NA> -
## 9 LAE08 IT HS LAE 1 TD <NA> -
## 10 LAE09 IT HS LAE 4 TD <NA> -
## # ℹ 84 more rows
## # ℹ 1 more variable: group.exclusion <chr>
### Mutation of <chr> variables into <fct>
df %>% mutate_if(is.character, as.factor)
## # A tibble: 94 × 38
## ID age L1_code education name_school class group other_diagnoses
## <fct> <dbl> <fct> <fct> <fct> <fct> <fct> <fct>
## 1 LAUVEN 20.3 IT BA university uni DYS disortografia
## 2 LEVEN 20.5 IT BA university uni TD <NA>
## 3 LAE02 14.2 IT HS LAE 1 TD <NA>
## 4 LAE03 14.6 IT HS LAE 1 TD <NA>
## 5 LAE04 18.7 IT HS LAE 4 DYS <NA>
## 6 LAE05 17.4 IT and ROM (H… HS LAE 4 TD <NA>
## 7 LAE06 14.3 IT HS LAE 1 TD <NA>
## 8 LAE07 14.9 IT HS LAE 1 TD <NA>
## 9 LAE08 14.1 IT HS LAE 1 TD <NA>
## 10 LAE09 17.1 IT HS LAE 4 TD <NA>
## # ℹ 84 more rows
## # ℹ 30 more variables: n_other_diagnoses <dbl>, age_diagnosis <fct>, AoO <dbl>,
## # wr_time_z <dbl>, wr_error_z <dbl>, nwr_time_z <dbl>, nwr_error_z <dbl>,
## # group.exclusion <fct>, pa.rt <dbl>, pa.acc <dbl>, pa.bis <dbl>,
## # forward <dbl>, backward <dbl>, lexita.acc <dbl>, lexita.rt <dbl>,
## # lexita.bis <dbl>, lextale.acc <dbl>, lextale.rt <dbl>, lextale.bis <dbl>,
## # it.ok.acc <dbl>, en.ok.acc <dbl>, en.ok.rt <dbl>, en.ok.bis <dbl>, …
mutate across multiple columnsYou can combine the function mutate with the function
across to apply the same mutation across more than
one column. They should be next to each other.
For instance, we want to mutate the three columns
pa.acc, pa.rt and pa.bis and we
want to scale them using the function scale:
df %>% mutate(across(pa.rt:pa.bis, ~scale(.x))) %>%
### check
dplyr::select(pa.rt:pa.bis)
## # A tibble: 94 × 3
## pa.rt[,1] pa.acc[,1] pa.bis[,1]
## <dbl> <dbl> <dbl>
## 1 -0.615 0.633 0.688
## 2 -0.642 0.633 0.703
## 3 -0.442 0.996 0.793
## 4 -0.939 0.996 1.07
## 5 -0.855 0.814 0.920
## 6 -1.00 0.996 1.10
## 7 -1.09 0.633 0.948
## 8 -0.794 0.270 0.586
## 9 -0.707 0.814 0.839
## 10 -0.972 0.996 1.08
## # ℹ 84 more rows
group_byMost data operations are done on groups defined by variables.
group_by() takes an existing tbl and converts it into a
grouped tbl where operations are performed “by group”.
ungroup()removes grouping. It is very
important to ungroup after the manipulation is applied
Imagine we want to create a new column using the function
mutate, which has the mean value of the
lexita.acc by group (DYS vs. TD):
df %>%
### First, we group the df as we need it, here, by group
group_by(group) %>%
mutate(mean.lexita = mean(lexita.acc)) %>%
### ALWAYS ungroup
ungroup() %>%
### check result
dplyr::select(ID, group, lexita.acc, mean.lexita)
## # A tibble: 94 × 4
## ID group lexita.acc mean.lexita
## <chr> <chr> <dbl> <dbl>
## 1 LAUVEN DYS 48 52.7
## 2 LEVEN TD 58 56.8
## 3 LAE02 TD 57 56.8
## 4 LAE03 TD 57 56.8
## 5 LAE04 DYS 55 52.7
## 6 LAE05 TD 58 56.8
## 7 LAE06 TD 59 56.8
## 8 LAE07 TD 56 56.8
## 9 LAE08 TD 56 56.8
## 10 LAE09 TD 56 56.8
## # ℹ 84 more rows
summarize or summarisesummarise()creates a new data frame. It returns one row
for each combination of grouping variables; if there are no grouping
variables, the output will have a single row summarising all
observations in the input. It will contain one column for each grouping
variable and one column for each of the summary statistics that you have
specified.
Let’s visualize the mean, sd and range of lexita.acc
score for each group in a table using summarize
### Option 1: Specify the grouping variable using `group_by`
df %>%
group_by(group) %>%
summarize(
mean = mean(lexita.acc),
sd = sd(lexita.acc),
range = range(lexita.acc)
)
## # A tibble: 4 × 4
## # Groups: group [2]
## group mean sd range
## <chr> <dbl> <dbl> <dbl>
## 1 DYS 52.7 7.89 17
## 2 DYS 52.7 7.89 60
## 3 TD 56.8 2.34 47
## 4 TD 56.8 2.34 60
### Option 2: Specify the grouping variable within the summarize function using .by = ""
df %>%
summarize(
mean = mean(lexita.acc),
sd = sd(lexita.acc),
range = range(lexita.acc),
.by = "group"
)
## # A tibble: 4 × 4
## group mean sd range
## <chr> <dbl> <dbl> <dbl>
## 1 DYS 52.7 7.89 17
## 2 DYS 52.7 7.89 60
## 3 TD 56.8 2.34 47
## 4 TD 56.8 2.34 60
renamerename() changes the names of individual variables using new_name = old_name syntax
Example: We want to change the original ID column into
ID_participant:
df %>% rename(ID_participant = ID)
## # A tibble: 94 × 38
## ID_participant age L1_code education name_school class group
## <chr> <dbl> <chr> <chr> <chr> <chr> <chr>
## 1 LAUVEN 20.3 IT BA university uni DYS
## 2 LEVEN 20.5 IT BA university uni TD
## 3 LAE02 14.2 IT HS LAE 1 TD
## 4 LAE03 14.6 IT HS LAE 1 TD
## 5 LAE04 18.7 IT HS LAE 4 DYS
## 6 LAE05 17.4 IT and ROM (HL) HS LAE 4 TD
## 7 LAE06 14.3 IT HS LAE 1 TD
## 8 LAE07 14.9 IT HS LAE 1 TD
## 9 LAE08 14.1 IT HS LAE 1 TD
## 10 LAE09 17.1 IT HS LAE 4 TD
## # ℹ 84 more rows
## # ℹ 31 more variables: other_diagnoses <chr>, n_other_diagnoses <dbl>,
## # age_diagnosis <chr>, AoO <dbl>, wr_time_z <dbl>, wr_error_z <dbl>,
## # nwr_time_z <dbl>, nwr_error_z <dbl>, group.exclusion <chr>, pa.rt <dbl>,
## # pa.acc <dbl>, pa.bis <dbl>, forward <dbl>, backward <dbl>,
## # lexita.acc <dbl>, lexita.rt <dbl>, lexita.bis <dbl>, lextale.acc <dbl>,
## # lextale.rt <dbl>, lextale.bis <dbl>, it.ok.acc <dbl>, en.ok.acc <dbl>, …
relocateUse relocate() to change column positions, using the same syntax as select() to make it easy to move blocks of columns at once.
relocate(.data, ..., .before = NULL, .after = NULL)
Example: Move the column group after
the column ID
df %>% relocate(group, .after = "ID")
## # A tibble: 94 × 38
## ID group age L1_code education name_school class other_diagnoses
## <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <chr>
## 1 LAUVEN DYS 20.3 IT BA university uni disortografia
## 2 LEVEN TD 20.5 IT BA university uni <NA>
## 3 LAE02 TD 14.2 IT HS LAE 1 <NA>
## 4 LAE03 TD 14.6 IT HS LAE 1 <NA>
## 5 LAE04 DYS 18.7 IT HS LAE 4 <NA>
## 6 LAE05 TD 17.4 IT and ROM (H… HS LAE 4 <NA>
## 7 LAE06 TD 14.3 IT HS LAE 1 <NA>
## 8 LAE07 TD 14.9 IT HS LAE 1 <NA>
## 9 LAE08 TD 14.1 IT HS LAE 1 <NA>
## 10 LAE09 TD 17.1 IT HS LAE 4 <NA>
## # ℹ 84 more rows
## # ℹ 30 more variables: n_other_diagnoses <dbl>, age_diagnosis <chr>, AoO <dbl>,
## # wr_time_z <dbl>, wr_error_z <dbl>, nwr_time_z <dbl>, nwr_error_z <dbl>,
## # group.exclusion <chr>, pa.rt <dbl>, pa.acc <dbl>, pa.bis <dbl>,
## # forward <dbl>, backward <dbl>, lexita.acc <dbl>, lexita.rt <dbl>,
## # lexita.bis <dbl>, lextale.acc <dbl>, lextale.rt <dbl>, lextale.bis <dbl>,
## # it.ok.acc <dbl>, en.ok.acc <dbl>, en.ok.rt <dbl>, en.ok.bis <dbl>, …
wr_time_z, wr_error_z,
nwr_time_z and nwr_error_z after the column
ID and then move age before
wr_time_zdf %>%
### step 1
relocate(wr_time_z:nwr_error_z, .after = "ID") %>%
### step 2
relocate(age, .before = "wr_time_z")
## # A tibble: 94 × 38
## ID age wr_time_z wr_error_z nwr_time_z nwr_error_z L1_code education
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 LAUVEN 20.3 -0.58 -1.23 -1.05 -1.45 IT BA
## 2 LEVEN 20.5 -0.23 0.1 -1.05 0.43 IT BA
## 3 LAE02 14.2 -0.97 0.59 -0.34 0.99 IT HS
## 4 LAE03 14.6 1.65 0.92 0.63 0.18 IT HS
## 5 LAE04 18.7 -4.58 1.04 -0.99 -0.43 IT HS
## 6 LAE05 17.4 -1.42 0.45 -0.03 -1.23 IT and RO… HS
## 7 LAE06 14.3 0.42 0.59 0.04 0.72 IT HS
## 8 LAE07 14.9 -1.4 -0.39 -1.17 0.45 IT HS
## 9 LAE08 14.1 -0.33 -0.39 -0.44 -0.36 IT HS
## 10 LAE09 17.1 0.32 1.04 0.46 1.15 IT HS
## # ℹ 84 more rows
## # ℹ 30 more variables: name_school <chr>, class <chr>, group <chr>,
## # other_diagnoses <chr>, n_other_diagnoses <dbl>, age_diagnosis <chr>,
## # AoO <dbl>, group.exclusion <chr>, pa.rt <dbl>, pa.acc <dbl>, pa.bis <dbl>,
## # forward <dbl>, backward <dbl>, lexita.acc <dbl>, lexita.rt <dbl>,
## # lexita.bis <dbl>, lextale.acc <dbl>, lextale.rt <dbl>, lextale.bis <dbl>,
## # it.ok.acc <dbl>, en.ok.acc <dbl>, en.ok.rt <dbl>, en.ok.bis <dbl>, …
pivot_longer & pivot_widerpivot_longer() “lengthens” data, increasing the
number of rows and decreasing the number of columns.
pivot_wider() “widens” data, increasing the number
of columns and decreasing the number of rows.
Examples
df %>%
dplyr::select(ID, wr_time_z:nwr_error_z) -> ex.1
### Original dataset
ex.1
## # A tibble: 94 × 5
## ID wr_time_z wr_error_z nwr_time_z nwr_error_z
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 LAUVEN -0.58 -1.23 -1.05 -1.45
## 2 LEVEN -0.23 0.1 -1.05 0.43
## 3 LAE02 -0.97 0.59 -0.34 0.99
## 4 LAE03 1.65 0.92 0.63 0.18
## 5 LAE04 -4.58 1.04 -0.99 -0.43
## 6 LAE05 -1.42 0.45 -0.03 -1.23
## 7 LAE06 0.42 0.59 0.04 0.72
## 8 LAE07 -1.4 -0.39 -1.17 0.45
## 9 LAE08 -0.33 -0.39 -0.44 -0.36
## 10 LAE09 0.32 1.04 0.46 1.15
## # ℹ 84 more rows
### Pivot longer
ex.1 %>%
pivot_longer(names_to = "Reading Measure",
values_to = "Reading Value",
### here we specify which columns need to be manipulated, so from the second to the fifth
2:5)
## # A tibble: 376 × 3
## ID `Reading Measure` `Reading Value`
## <chr> <chr> <dbl>
## 1 LAUVEN wr_time_z -0.58
## 2 LAUVEN wr_error_z -1.23
## 3 LAUVEN nwr_time_z -1.05
## 4 LAUVEN nwr_error_z -1.45
## 5 LEVEN wr_time_z -0.23
## 6 LEVEN wr_error_z 0.1
## 7 LEVEN nwr_time_z -1.05
## 8 LEVEN nwr_error_z 0.43
## 9 LAE02 wr_time_z -0.97
## 10 LAE02 wr_error_z 0.59
## # ℹ 366 more rows
Example for pivot_wider
ex.1 %>%
pivot_longer(names_to = "Reading Measure", values_to = "Reading Value",
### here we specify which columns need to be manipulated, so from the second to the fifth
2:5) -> ex.2
### Original data
ex.2
## # A tibble: 376 × 3
## ID `Reading Measure` `Reading Value`
## <chr> <chr> <dbl>
## 1 LAUVEN wr_time_z -0.58
## 2 LAUVEN wr_error_z -1.23
## 3 LAUVEN nwr_time_z -1.05
## 4 LAUVEN nwr_error_z -1.45
## 5 LEVEN wr_time_z -0.23
## 6 LEVEN wr_error_z 0.1
## 7 LEVEN nwr_time_z -1.05
## 8 LEVEN nwr_error_z 0.43
## 9 LAE02 wr_time_z -0.97
## 10 LAE02 wr_error_z 0.59
## # ℹ 366 more rows
### Pivot wider
ex.2 %>%
pivot_wider(names_from = "Reading Measure",
values_from = "Reading Value")
## # A tibble: 94 × 5
## ID wr_time_z wr_error_z nwr_time_z nwr_error_z
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 LAUVEN -0.58 -1.23 -1.05 -1.45
## 2 LEVEN -0.23 0.1 -1.05 0.43
## 3 LAE02 -0.97 0.59 -0.34 0.99
## 4 LAE03 1.65 0.92 0.63 0.18
## 5 LAE04 -4.58 1.04 -0.99 -0.43
## 6 LAE05 -1.42 0.45 -0.03 -1.23
## 7 LAE06 0.42 0.59 0.04 0.72
## 8 LAE07 -1.4 -0.39 -1.17 0.45
## 9 LAE08 -0.33 -0.39 -0.44 -0.36
## 10 LAE09 0.32 1.04 0.46 1.15
## # ℹ 84 more rows