coding is fun

WELCOME

Welcome to the PS2010 Code Book. Here you will find a collection of the R code used for all data analyses covered across the module. Use the menu on the left hand side to navigate.

Some important tips:

  • The data used in all of this code will be referred to as “mydata”. Make sure you change this if you have called your data something else.

  • Where you see NULL, NULL1, NULL2 etc is where you will need to add information based on the data set you are using.

1) Basics

Install and Load Packages


Install a package

install.packages(NULL)

install.packages("tidyverse") #An example of how this code works

Change NULL to the name of the package.


Load a package

library(NULL)

library(tidyverse) #An example of how this code works

Change NULL to the name of the package.


Packages Used Across PS2010

install.packages("tidyverse")
install.packages("rstatix")
install.packages("car")

Importing Data

Set the Working Directory

setwd(NULL)

setwd(C:/Users/username/documents) #An example of how this code works

Change NULL to the folder path.

Remember you can also set the working directory using the menu options:

Session -> Set Working Directory -> Choose Directory


Check the Working Directory

getwd()

This will tell you the current working directory.


Import Data

mydata = read_csv(NULL)

mydata = read_csv("experiment_data.csv") #An example of how this code works

mydata is the object that we save the data under. You can call it whatever you like.

You can see the data in the Environment panel (top right in RStudio).

Change NULL to the name of the file.

Remember to add “.csv” at the end of the file name.

R only reads .csv data files.

Note: read_csv() requires the tidyverse package.


2) Describing Data

Checking a Data Set and Getting a Summary


See First Few Rows of a Data Set

head(NULL)

head(mydata) #An example of how this code works

Variable Names in a Data Set

names(NULL)

names(mydata) #An example of how this code works

Quickly Summarising a Data Set

summary(NULL)

summary(mydata) #An example of how this code works

Change NULL to the name of the object/data file you want to summarise.


Splitting by Group or Variable

NULL1 = mydata %>%
  group_by(NULL2) %>%
  summarise(
    mean = mean(NULL3), sd = sd(NULL3)  )

Change NULL1 to the name of the object/data file you want to create (i.e that will hold the descriptive statistics)

Change NULL2 to the name of the variable you want to split the data by.

Change NULL3 to the name of the dependent variable you want the descriptive statistics for.

See the example below:

summary_by_group = mydata %>%
  group_by(drink) %>%
  summarise(mean = mean(bpm), sd = sd(bpm))

Line by line analysis:

  1. The first line creates an object called summary_by_group which uses mydata AND THEN

  2. It will group mydata by the grouping variable, in this example called drink AND THEN

  3. It asks for the summarise() function then it will call the mean mean and ask for the mean of our dependent variable using mean(bpm) then it will call the standard deviation sd and ask for the standard deviation of our dependent variable using sd(bpm)


Mean and Standard Deviation for an Entire Data Set

Use the code above again, but just remove the group_by() line of code. See example below:

summary_by_group = mydata %>%
  summarise(mean = mean(bpm), sd = sd(bpm))

3) T-tests I (Independent)

Packages

install.packages("rstatix")
library(tidyverse)
library(rstatix)

Running an Independent T-test

The T-test

NULL1 = t.test(NULL2 ~ NULL3, mydata, var.equal = FALSE)

Change NULL1 to the name of the object to store the t-test results.

Change NULL2 to the name of the dependent variable.

Change NULL3 to the name of the independent variable.

See the example below:

t_test = t.test(bpm ~ drink, mydata, var.equal = FALSE)

print(t_test) #this will print the t-test results to the console.

Cohen’s D

Note: this requires the ‘rstatix’ package.

mydata %>% 
  cohens_d(NULL1 ~ NULL2, var.equal = FALSE)

Change NULL1 to the name of the dependent variable.

Change NULL2 to the name of the independent variable.

See the example below:

mydata %>% 
  cohens_d(bpm ~ drink, var.equal = FALSE)

Checking Normality with a Histogram

hist(mydata$bpm[mydata$NULL1 == "NULL2"]) #creates histogram for a variable

hist(mydata$bpm[mydata$drink == "water"]) #An example of how this code works

Change NULL1 to the name of the variable name.

Change NULL2 to the name of the condition/group you want to plot.


Checking Normality with a Shapiro Test

shapiro.test(mydata$bpm[mydata$NULL1 == "NULL2"]) #creates histogram for a variable

shapiro.test(mydata$bpm[mydata$drink == "water"]) #An example of how this code works

Change NULL1 to the variable name.

Change NULL2 to the name of the condition/group you want to plot.


4) T-tests II (Repeated and One-Way)

Packages

install.packages("rstatix")
library(tidyverse)
library(rstatix)

Preparing Data Set

Make sure that the data set is in long format. If it isn’t, you can use the code below to change it. Ensuring your data is in the correct format is part of data wrangling. The below code will work for a simple wide format data set with 2 columns of data (excluding ID)

mydata_long = mydata %>%
  pivot_longer(cols = c(NULL1, NULL2),
               names_to = "NULL3",
               values_to = "NULL4")

Change NULL1 and ‘NULL2’ to your groups/conditions.

Change NULL3 to the name of the variable for the groups/conditions.

Change ‘NULL4’ to whatever the values are (e.g., the dependent variable name usually)

See the example below:

mydata_long = mydata %>%
  pivot_longer(cols = c(redcow, redcowX),
               names_to = "drink",
               values_to = "attention")

The above code will create a new data set called mydata_long, which will contail all the same values but now in a long format. This uses the pivot_longer() function.


Running a Repeated T-test

The T-test

NULL1 = t.test(NULL2 ~ NULL3, mydata, paired = TRUE)

Change NULL1 to the name of the object to store the t-test results.

Change NULL2 to the name of the dependent variable.

Change NULL3 to the name of the independent variable.

See the example below:

redcow_t = t.test(attention ~ drink, mydata_long, paired = TRUE)

print(redcow_t) #this will print the t-test results to the console.

Note: this requires the ‘rstatix’ package.

mydata %>% 
  cohens_d(NULL1 ~ NULL2, paired = TRUE)

Change NULL1 to the name of the dependent variable.

Change NULL2 to the name of the independent variable.

See the example below:

mydata_long %>% 
  cohens_d(attention ~ drink, paired = TRUE)

Checking Normality for Repeated Measures t-tests

Remember we need to check normality using the difference score. First we need to calculate this.

diff_score = mydata$NULL1 - mydata$NULL2

diff_score = mydata$redcow - mydata$redcowX #An example of this code

Change NULL1 to the name of the first condition/group.

Change NULL2 to the name of the second condition/group.

We will now use this diff_score for the steps below:


Checking Normality with a Histogram and a Shapiro Test

hist(diff_score) #creates histogram for the diff_score

shapiro.test(diff_score) #runs Shapiro test for diff_score

Running a One Sample T-test

The T-test

t.test(mydata_choc$NULL1, mu = NULL2)

Change NULL1 to the variable name.

Change NULL2 to the reference value.

See example below:

t.test(mydata_choc$weight, mu = 30)

5) Independent One-Way ANOVA

#for webpage:

output:
  html_document:
  theme: flatly
  highlight: haddock
  toc: true
  toc_float:
    collapsed: false
  smooth_scroll: false

Additional code for workshop sheets. workshop 2 t-tests exercise 6

ggplot(mydata, aes(x = drink, y = bpm)) +
  geom_boxplot() +
  labs(title = "Beats Per Minute for RedCow vs Water",
       x = "Drink",
       y = "Beats Per Minute (BPM") +
  theme_classic()

Boxplot for workshop 3 exercise 2

ggplot(mydata_long, aes(x = drink, y = attention, fill = drink)) +
  geom_boxplot() +
  theme_classic() 

workshop 3 optional exercises.

ggplot(mydata_long, aes(x = drink, y = attention, fill = drink)) +
  geom_boxplot() +
  labs(title = "Attention Score: RedCow vs RedCowX",
       x = "Condition",
       y = "Attention Score") +
  theme_classic() +
  scale_fill_manual(values = c("grey", "grey"))

#line by line explanation
#line36: call gpplot, use your datafile, aes(x = x axis variable, y = y axis variable, fill = condition/group variable)
#line37: as for a boxplot using geom_bolxplot()
#lines38-40: add you title and axis labels
#line41: pick a theme for how the figure looks
#line42: pick which colours to fill the boxplot.
ggplot(mydata_long, aes(x = drink, y = attention)) +
  geom_bar(stat = "summary", fun = "mean", fill = "lightgrey", width = 0.7) +
  stat_summary(fun.data = mean_sdl, fun.args = list(mult = 1), geom = "errorbar", width = 0.2) +
  labs(title = "Average Attention Score with SD: RedCow vs RedCowX",
       x = "Condition",
       y = "Mean Attention Score") +
  theme_classic()