1. Introduction

1.0 Welcome!

Welcome to Introduction to R! This is where I am storing all my notes about R from the PSYC3361 online modules. This document is intended to be a master “cheat sheet” of sorts, so that all the commands are in one, easily accessible place. Writing my notes in R Markdown is also a handy way for me to actively practise what I’ve learned. Happy reading and happy coding!

1.1 What are Markdown, R and R Markdown?

Markdown is used exclusively for text and formatting.
- Allows you to make annotations; kind of like a word doc!
- Files are called “markdown files” with the file extension .md
R is a programming language
- Allows you to write and run code, which is useful for data analysis
- Files are called “R Script” with the file extension .R
- Used when running analyses
R Markdown combines R and markdown.
- This means that you can both write text and insert chunks of code
- Useful for these kinds of text notes where you also want to insert code, because you are able to both see the backend code as well as its output side by side
- Files are called “R Markdown” with the file extension .Rmd

2. Markdown Formatting Guide

2.0 Text Formatting

Italics: (*text*)
Bold: (**text**)
Italics + bold: (***text***)

2.1 Dot points and Numbered Lists

There’s not much to say here
other than the fact
that numbered lists and

dot points
format themselves
and all you need to do
is use dashes and numbers
- you can also add an indent
- to create nested lists!

2.2 Headings and Paragraphs

I am not going to give examples here for the sake of not screwing up the table of contents; but throughout this document you’ll see many examples of headings.
Heading 1: (# text)
Heading 2: (## text)
Heading 3 (### text)
- … and so forth
- nb a space should be added after the #, as seen above
to separate paragraphs, add the syntax <br/> where you would like a line break
- you can do this at the end of a paragraph (new paragraph), or in a new line (line break)

2.3 Adding media: Hyperlinks and Images

Hyperlinks

Here is an example hyperlink: Danielle Navarro’s R Youtube Series
Syntax: [text](hyperlink)
- Alternatively you can just paste the link directly and it will be an automatic hyperlink!

Images

Here is an example image

The pathway can be to a file location in your computer OR an image address from the internet
Syntax: ![](https://…pathway)

2.4 Block Quotes

Here is an example of a block quote!

Syntax: Add a “>” before your quote

2.5 Equations

Anything between dollar signs ($) is treated as “inline” (ie. in-text) maths
- e.g $x^2$ is inline
Anything between two dollar signs ($$) is a standalone equation (i.e. in its own line and centered)
- e.g the following equation is standalone: \[a^2 +b^2 = c^2\]
nb:
- there must not be any whitespace between your dollar signs and equations
- equations follow LaTeX rules

2.6 Using backslashes to comment out syntax

Notice how throughout this document I have been able to type syntax without it turning into the relevant formatting function.
This is because I added a backslash (\) before the syntax!
Note that you will (annoyingly) have to add a backslash before each individual syntax thingamabob
- For example, if using ** to bold something, you need to add a backslash before each asterisk (otherwise it will still register the second asterisk)

3. R Markdown Formatting Guide

nb: everything covered above in the Markdown formatting guide is relevant to R Markdown!

3.0 What is a YAML Header?

“Yet Another Markdown Language”
Header used in R Markdown, denoted by - - -
Tells you the core details of your document
Can also be customised:
- type of output file: pdf_document, html_document and word_document
  - nb this can also be done by selecting the options in the “knit” dropdown menu
- Add a table of contents (toc: true) and make it float (toc_float: true)
- Change the theme of the output file (some that I like are default, cerulean, paper, readable and yeti)
Some YAML formatting things to be aware of:
- YAML can be quite picky with indents and colons
- sub-items should be started in a new line (with an indent) and NOT directly after the colon
- it can be directly after the colon if it refers to that item itself (i.e. is not a sub-item)

3.1 Inserting R code

Code > Insert chunk
Denoted by ```{r}
Example below:

print("hello world!")

## [1] "hello world!"

4. R commands

Welcome to the actual coding portion of R! Please note that R Markdown can’t actually read data files, so these notes might be a little disjointed or incomplete. I’d recommend using these notes to interpret an actual R script, so that you can see the code in action while also understanding the functions of each line of code.

4.0 Basics of R

When you code in R, there are three main parts that you will be working with (in the following order):

Comments

#formatted something like this
Text comments are used to organise your code, and explain what you are doing immediately preceding a line/chunk of code
“commenting out” is a handy technique commonly used to disable a line of code (just basically turns code into text)

R Script

The actual code part (found in the top left of Rstudio)
Runs all your analyses

Console

Output after having run the code (found in the bottom left of Rstudio)
The console is also used for running code, but we mainly write/run code in its own script just so that its saveable etc.

You’ll see all this in action below!

NOTE THE FOLLOWING FOUNDATIONAL TERMINOLOGY

Given the line:

ggplot(data = mpg, mapping = aes(x = displ, y = hwy))

“ggplot(…)” is a function
“ggplot(data = mpg)” is a named argument
- data is the name of the argument
“ggplot(mpg)” is an unnamed argument
“mpg” is a variable (value that is used to define the argument)

A note about named arguments

you don’t actually need a name for the argument, R implies these based on conventions
removing the name makes code more compact
on the other hand, named arguments make it clearer what your code is doing
- useful if you do not remember argument conventions
- but named arguments are recommended for now since you are still an R beginner
so the above code can actually be condensed as:

ggplot(mpg, aes(displ,hwy))

4.1 Reading Raw Data (.csv files)

What is Tidyverse?

Tidyverse is an R package specifically for data analysis
It comes with a bunch of handy data-analysis-related commands and functions
See below code:

#load packages
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

#read the data
frames <- read_csv(file = "C:\\Users\\Alyss\\Downloads\\data_reasoning.csv")

## Rows: 4725 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): gender, condition, sample_size
## dbl (5): id, age, n_obs, test_item, response
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

In the above script, we have:

loaded the tidyverse package
read the relevant datafile using the “read_csv” function
created a variable “frames” as a shortcut to this line of code, by typing “frames <-”
- now, whenever we run the “frames” variable it will run the same line of code previously specified
in Rstudio, we can also now view the dataframe that has been read, in the “Environment” pane (top right in Rstudio)

Interpreting the console output

“chr” refers to “character” = string variables
“dbl” refers to “double” = numeric variables

Printing and Glimpsing your Data

Print: Prints the dataframe
Glimpse: Prints the dataframe but flips the rows and columns

print(frames)

## # A tibble: 4,725 × 8
##       id gender   age condition sample_size n_obs test_item response
##    <dbl> <chr>  <dbl> <chr>     <chr>       <dbl>     <dbl>    <dbl>
##  1     1 male      36 category  small           2         1        8
##  2     1 male      36 category  small           2         2        7
##  3     1 male      36 category  small           2         3        6
##  4     1 male      36 category  small           2         4        6
##  5     1 male      36 category  small           2         5        5
##  6     1 male      36 category  small           2         6        6
##  7     1 male      36 category  small           2         7        3
##  8     1 male      36 category  medium          6         1        9
##  9     1 male      36 category  medium          6         2        7
## 10     1 male      36 category  medium          6         3        5
## # ℹ 4,715 more rows

glimpse(frames)

## Rows: 4,725
## Columns: 8
## $ id          <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ gender      <chr> "male", "male", "male", "male", "male", "male", "male", "m…
## $ age         <dbl> 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36…
## $ condition   <chr> "category", "category", "category", "category", "category"…
## $ sample_size <chr> "small", "small", "small", "small", "small", "small", "sma…
## $ n_obs       <dbl> 2, 2, 2, 2, 2, 2, 2, 6, 6, 6, 6, 6, 6, 6, 12, 12, 12, 12, …
## $ test_item   <dbl> 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6…
## $ response    <dbl> 8, 7, 6, 6, 5, 6, 3, 9, 7, 5, 6, 4, 4, 2, 8, 7, 6, 6, 4, 1…

4.2 Summarising Data

Why are Data Summaries important?

Allows us to group data (e.g by condition)
Also allows us to compute variables (e.g mean)
See below code:

#summarise my data
data_summary <- frames %>% 
  group_by(test_item, condition, sample_size) %>% 
  summarise(
    mean_resp = mean(response),
    sd_resp = sd(response)
  ) %>% 
  ungroup()

## `summarise()` has grouped output by 'test_item', 'condition'. You can override
## using the `.groups` argument.

In the above script, we have:

grouped the “frames” data by test item, condition and sample size using the “group_by(…) function, THEN
computed the variable “mean_resp” using the mean(…) function AND
computed the variable “sd_resp” using the sd(…) function, THEN
ungrouped the data (data hygiene practice; so that the ungrouped frames data can be later used again)

Writing the summary data to a new file

#write summary to file
write_csv(data_summary, file = "data_summary.csv")

#print the summary
print(data_summary)

## # A tibble: 42 × 5
##    test_item condition sample_size mean_resp sd_resp
##        <dbl> <chr>     <chr>           <dbl>   <dbl>
##  1         1 category  large            7.60    2.36
##  2         1 category  medium           7.32    2.49
##  3         1 category  small            6.07    2.82
##  4         1 property  large            7.16    2.23
##  5         1 property  medium           6.66    2.40
##  6         1 property  small            5.78    2.57
##  7         2 category  large            7.51    2.01
##  8         2 category  medium           7.17    1.99
##  9         2 category  small            6.26    2.28
## 10         2 property  large            7.20    1.84
## # ℹ 32 more rows

In the above script, we have:

saved data_summary into a new csv file named “data_summary.csv”, using the write_csv(…) function
printed data_summary
This now summarises the data and shows the means and standard deviations. Pretty cool!

4.3 Data Visualisation

What is ggplot?

ggplot is another R package specifically for data visualisation
It comes with a bunch of handy data-visualisation-related commands and functions
plots can be viewed in the “plots” pane (bottom right of Rstudio)

4.3.1 Scatterplots (Intro visualisation)

See below code:

#load packages
library(tidyverse)

#visualise mpg data in a scatterplot
picture <- ggplot(data = mpg) +
  geom_point(
    mapping = aes(
      x = displ,
      y = hwy,
      color = cyl
      # color = factor(cyl)
      ),
    # color = "purple"
    size = 4
    )+
  geom_smooth(
    mapping = aes(
      x = displ,
      y = hwy,
    )
  ) +
  geom_rug(
    mapping = aes(
      x = displ,
      y = hwy,
    )
  )

#print the ggplot object
print(picture)

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

In the above script, we have:

loaded the tidyverse package
used the ggplot(…) function to prep data for visualisation (BASE LAYER)

nb: mpg data is a sample dataset embedded into the tidyverse package

used the geom_point(mapping = …) function to visualise mpg data in a scatterplot (FIRST LAYER)
used the aes(…) function to provide the scatterplot aesthetics

aesthetics is a fancy term for plot features. these include:
variables used for x and y axes
any other variables that can be visualised in the datapoints themselves
- for example, in this specific example, the colour gradient of the datapoints visualise the number of cylinders
  - note the following commented out “color = factor(cyl)”
  - all the “factor(…)” function does is transform cyl into a categorical rather than continuous variable (which is then represented in the colours)
- this can be visualised in other ways too (e.g try “size”, “fill”, “shape”)

added an additional parameter “size” within the geom_point(mapping = …) scatterplot

again, parameters are similar to aesthetics but this time it is not related to a variable but is purely related to the formatting of the plot
try size, fill, shape etc
nb sadly you cannot specify something as both an aesthetic and a parameter (or maybe you can and i just dont know how)
- I have however added a commented out color parameter if you would like to see what this would look like (must comment out the color aesthetic for this to work)

used the geom_smooth(mapping = …) function to add a smooth regression line to the plot (SECOND LAYER)

Notice how you basically repeat everything you already had with the geom_point function
This is because you are basically layering multiple different functions on top of each other (as denoted by the plus sign)
here, we are specifying the aesthetics for every layer
- we don’t really need to do this; see below for global and local mappings
- however doing it this way provides the advantage of using different aesthetics for different layers - useful for layered comparison

used the geom_rug(mapping = …) function to add lines to the axes to further visualise each datapoint (THIRD LAYER)
summarised all this in a variable named “picture” by using “picture <-”
printed “picture” (i.e. printed the scatterplot)

Global and Local Mappings

Note also that to tidy up your above code and remove repeated chunks, you can simply apply “global mapping” where the aesthetics or parameters are the same throughout
see below code:

#visualise mpg data in a scatterplot
global_picture <- ggplot(data = mpg,
  mapping = aes(x = displ, y = hwy) 
  ) +
  geom_point(mapping = aes(color = factor(cyl)), size = 4) +
  geom_smooth() +
  geom_rug()


#print the ggplot object
print(global_picture)

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Notice here how we’ve just applied mapping = aes(x = …, y = …) directly to the ggplot function (rather than each of the geom functions)
this alters the base layer such that these aesthetics are applied to all plots in this ggplot
- but we can also add additional aesthetics/parameters to each individual layer

4.3.2 Box, Violin & Column Plots

Same as the scatterplot, but you can play around with different types of plots

#read the data_forensic csv
data_forensic <- read_csv(file = "C:\\Users\\Alyss\\Downloads\\data_forensic.csv")

## Rows: 5700 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): handwriting_expert, us, condition, forensic_scientist, forensic_spe...
## dbl (7): participant, age, handwriting_reports, confidence, familiarity, est...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

#visualise data_forensic in a box and whisker plot
plot_1 <- ggplot(data_forensic) + geom_boxplot(aes(band, est))

#draw plots
print(plot_1)

## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_boxplot()`).

#read the data_forensic csv
data_forensic <- read_csv(file = "C:\\Users\\Alyss\\Downloads\\data_forensic.csv")

## Rows: 5700 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): handwriting_expert, us, condition, forensic_scientist, forensic_spe...
## dbl (7): participant, age, handwriting_reports, confidence, familiarity, est...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

#visualise data_forensic in a violin plot
plot_2 <- ggplot(data_forensic) + geom_violin(aes(band, est))

#draw plots
print(plot_2)

## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_ydensity()`).

#read the data_forensic csv
data_forensic <- read_csv(file = "C:\\Users\\Alyss\\Downloads\\data_forensic.csv")

## Rows: 5700 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): handwriting_expert, us, condition, forensic_scientist, forensic_spe...
## dbl (7): participant, age, handwriting_reports, confidence, familiarity, est...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

#visualise data_forensic in a column plot
plot_3 <- ggplot(data_forensic) + geom_col(aes(band, est))

#draw plots
print(plot_3)

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_col()`).

4.3.3 Facets - Creating Sub-Plots

A facet is a sub-plot
We can break plots into multiple sub-plots, for example if we want to group the data.
The below code breaks the data_forensic box plots into facets, grouped by participant expertise

#create facets
by_expertise <- ggplot(data_forensic) +
  geom_boxplot(aes(band, est)) +
  facet_wrap(vars(handwriting_expert))

#print facets
print(by_expertise)

## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_boxplot()`).

In the above script, we have:

used the ggplot(…) and geom_boxplot(…) functions to plot a box-and-whisker plot
used the facet_wrap(…) function to create subplots/facets that are grouped using the variable vars(handwriting_expert)
printed the facets

4.3.4 Beautifying your plots

The below are just some useful methods to clean up and beautify your plots. We will use the above facets as the plot to beautify.

Aesthetics

fill = band: changes the fill colour of the box plot to a rainbow band

Parameters

theme_minimal(): changes the background to a white instead of a grey background
theme_dark(): changes the background to a dark instead of a grey background
scale_x_discrete() and scale_y_discrete(): edit the x and y axis labels
- name = NULL: removes axis title
- labels = NULL: removes the axis labels
- replacing NULL with “…” gives it a title or label
ggtitle(): edits plot title
- label = “…”: Gives the plot a title
- subtitle = “…”: Gives the plot a subtitle
scale_fill_viridis_d(): edits the “scale” of the existing fill
- alpha = “…”: opacity of fill
- name = NULL: removes fill legend

pic <- ggplot(
  data = data_forensic
) +
  geom_boxplot(
    mapping = aes(
      x = band,
      y = est,
      fill = band
    )
  ) +
  facet_wrap(
    vars(handwriting_expert)
  ) +
  theme_minimal() +
  scale_x_discrete(
    name = NULL, #(name refers to the axis title)
    labels = NULL #(labels refer to the axis labels)
  ) +
  scale_y_discrete(
    name = "Estimated Probability"
  ) +
  ggtitle(
    label = "Handwriting estimates for experts and novices",
    subtitle = "Source: Matire et al."
  ) +
  scale_fill_viridis_d(
    alpha = .5,
    name = NULL
  )

print(pic)

## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_boxplot()`).

play around with commenting out the various beautifying functions to see what each one does

4.4 Data Wrangling - Using dplyr

Data wrangling is the process of cleaning up your raw data so that it is in an analysable and interpretable format. Think turning raw keypress data into accuracy data. In other words, it is the preprocessing of data before data is analysed, and involves cleaning and filtering your data, and computing important variables.

dplyr is a package within tidyverse which we will use for data wrangling.

4.4.1 Renaming variables

#Import the SWOW data

library(tidyverse)
swow <- read_tsv(file = "C:\\Users\\Alyss\\Downloads\\data_swow.csv.zip")

## Multiple files in zip: reading 'swow.csv'
## Rows: 483636 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (2): cue, response
## dbl (3): R1, N, R1.Strength
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

swow <- swow %>% mutate(id = 1:n())
print(swow)

## # A tibble: 483,636 × 6
##    cue   response    R1     N R1.Strength    id
##    <chr> <chr>    <dbl> <dbl>       <dbl> <int>
##  1 a     one         21    97      0.216      1
##  2 a     the         16    97      0.165      2
##  3 a     b            9    97      0.0928     3
##  4 a     an           4    97      0.0412     4
##  5 a     first        3    97      0.0309     5
##  6 a     letter       3    97      0.0309     6
##  7 a     alphabet     2    97      0.0206     7
##  8 a     apple        2    97      0.0206     8
##  9 a     article      2    97      0.0206     9
## 10 a     bat          2    97      0.0206    10
## # ℹ 483,626 more rows

#manual variable name cleaning new name = old name (case sensitive)

swow <- swow %>% 
  rename(n_response = R1,
         n_total = N,
         strength = R1.Strength)
print(swow)

## # A tibble: 483,636 × 6
##    cue   response n_response n_total strength    id
##    <chr> <chr>         <dbl>   <dbl>    <dbl> <int>
##  1 a     one              21      97   0.216      1
##  2 a     the              16      97   0.165      2
##  3 a     b                 9      97   0.0928     3
##  4 a     an                4      97   0.0412     4
##  5 a     first             3      97   0.0309     5
##  6 a     letter            3      97   0.0309     6
##  7 a     alphabet          2      97   0.0206     7
##  8 a     apple             2      97   0.0206     8
##  9 a     article           2      97   0.0206     9
## 10 a     bat               2      97   0.0206    10
## # ℹ 483,626 more rows

In the above script, we have:

read a zip folder with the read_tsv(…) function
shortened this zip folder to the variable “swow”
used the rename(“new name” = “old name”) function to manually rename our variables
- note that this is case sensitive; if its not working case could be the reason
- compare the output to see changes in variable names

4.4.2 Selecting Variables (Columns), Filtering Rows and Rearranging Data

#filtering for response = woman, response was given by more than one person
woman_bck <- swow %>% 
  filter(response == "woman", n_response >1) %>% 
  arrange(desc(strength)) %>% #decreasing strength
  select(cue, response, strength, id)
  #alternatively you can also filter out irrelevant variables:
  #select(-n_response, -n_total)
  #select(-starts_with("n_"))

print(woman_bck)

## # A tibble: 200 × 4
##    cue       response strength     id
##    <chr>     <chr>       <dbl>  <int>
##  1 man       woman       0.576 258593
##  2 lady      woman       0.36  240149
##  3 feminist  woman       0.303 158641
##  4 female    woman       0.232 158492
##  5 pregnant  woman       0.18  327286
##  6 housewife woman       0.17  209394
##  7 vagina    woman       0.17  459047
##  8 dame      woman       0.167 105474
##  9 menopause woman       0.16  266238
## 10 uterus    woman       0.16  458533
## # ℹ 190 more rows

In the above script, we have:

used the filter(…) function to filter for a specific subset of the data
- note the double equal signs “==”
used the arrange(…) function to arrange the data by strength
- desc(strength) arranges it in decreasing order, but you can also do increasing order
used the select(var1, var2, var3…) function to select the variables/columns of interest (and filter out the rest)
- note that you can also “filter out” instead of “select” using the select(-var1, -var2) function (i.e. notice the negative signs)
- “starts_with” is a good shortcut function

4.4.3 Computing New Variables

#Computing a new "rank" variable which ranks "strength"
#Also creating new values for data that we already have (but just making the table cleaner and more readable)
woman_bck <- swow %>% 
  filter(response == "woman", n_response >1) %>% 
  arrange(desc(strength)) %>% #decreasing strength
  select(-starts_with("n_")) %>% 
  mutate(rank = rank(-strength),
         type = "backward",
         word = response,
         associate = cue
         )

print(woman_bck)

## # A tibble: 200 × 8
##    cue       response strength     id  rank type     word  associate
##    <chr>     <chr>       <dbl>  <int> <dbl> <chr>    <chr> <chr>    
##  1 man       woman       0.576 258593   1   backward woman man      
##  2 lady      woman       0.36  240149   2   backward woman lady     
##  3 feminist  woman       0.303 158641   3   backward woman feminist 
##  4 female    woman       0.232 158492   4   backward woman female   
##  5 pregnant  woman       0.18  327286   5   backward woman pregnant 
##  6 housewife woman       0.17  209394   6.5 backward woman housewife
##  7 vagina    woman       0.17  459047   6.5 backward woman vagina   
##  8 dame      woman       0.167 105474   8   backward woman dame     
##  9 menopause woman       0.16  266238   9.5 backward woman menopause
## 10 uterus    woman       0.16  458533   9.5 backward woman uterus   
## # ℹ 190 more rows

In the above script, we have:

done everything done above
used the mutate(…) function to compute a new variable “rank”, “type” and “associate”
used rank = rank(-strength) to rank by strength
- note the negative sign;
- this is just because the rank() function ranks in ascending order
- so to rank in descending order they have turned it negative
used type = “backward” to add a constant value to label these data as “backwards”
used word = response and associate = cue to create a variable that records the “response” and “cue” variables in an interpretable (coded) way

5. Useful shortcuts

Comment: Ctrl+Shift+C
Clear Screen: Ctrl+L
Pipe (%>%): Ctrl+Shift+M
Run Code: Ctrl+Shift+S
Save: Ctrl+S

Introduction to R

Alyssa Lim

2024-06-05

1. Introduction

1.0 Welcome!

1.1 What are Markdown, R and R Markdown?

2. Markdown Formatting Guide

2.0 Text Formatting

2.1 Dot points and Numbered Lists

2.2 Headings and Paragraphs

2.3 Adding media: Hyperlinks and Images

2.4 Block Quotes

2.5 Equations

2.6 Using backslashes to comment out syntax

3. R Markdown Formatting Guide

3.0 What is a YAML Header?

3.1 Inserting R code

4. R commands

4.0 Basics of R

4.1 Reading Raw Data (.csv files)

4.2 Summarising Data

4.3 Data Visualisation

4.3.1 Scatterplots (Intro visualisation)

4.3.2 Box, Violin & Column Plots

4.3.3 Facets - Creating Sub-Plots

4.3.4 Beautifying your plots

4.4 Data Wrangling - Using dplyr

4.4.1 Renaming variables

4.4.2 Selecting Variables (Columns), Filtering Rows and Rearranging Data

4.4.3 Computing New Variables

5. Useful shortcuts