Welcome to the PS2010 Code Book. Here you will find a collection of the R code used for all data analyses covered across the module. Use the menu on the left hand side to navigate.
Some important tips:
The data used in all of this code will be referred to as “mydata”. Make sure you change this if you have called your data something else.
Where you see NULL, NULL1,
NULL2 etc is where you will need to add information based
on the data set you are using.
install.packages(NULL)
install.packages("tidyverse") #An example of how this code works
Change NULL to the name of the package.
library(NULL)
library(tidyverse) #An example of how this code works
Change NULL to the name of the package.
install.packages("tidyverse")
install.packages("rstatix")
install.packages("car")
setwd(NULL)
setwd(C:/Users/username/documents) #An example of how this code works
Change NULL to the folder path.
Remember you can also set the working directory using the menu options:
Session -> Set Working Directory -> Choose Directory
getwd()
This will tell you the current working directory.
mydata = read_csv(NULL)
mydata = read_csv("experiment_data.csv") #An example of how this code works
mydata is the object that we save the data under. You
can call it whatever you like.
You can see the data in the Environment panel (top right in RStudio).
Change NULL to the name of the file.
Remember to add “.csv” at the end of the file name.
R only reads .csv data files.
Note: read_csv() requires the tidyverse package.
head(NULL)
head(mydata) #An example of how this code works
names(NULL)
names(mydata) #An example of how this code works
summary(NULL)
summary(mydata) #An example of how this code works
Change NULL to the name of the object/data file you want
to summarise.
NULL1 = mydata %>%
group_by(NULL2) %>%
summarise(
mean = mean(NULL3), sd = sd(NULL3) )
Change NULL1 to the name of the object/data file you
want to create (i.e that will hold the descriptive statistics)
Change NULL2 to the name of the variable you want to
split the data by.
Change NULL3 to the name of the dependent variable you
want the descriptive statistics for.
See the example below:
summary_by_group = mydata %>%
group_by(drink) %>%
summarise(mean = mean(bpm), sd = sd(bpm))
Line by line analysis:
The first line creates an object called
summary_by_group which uses mydata AND
THEN
It will group mydata by the grouping variable, in
this example called drink AND THEN
It asks for the summarise() function then it will
call the mean mean and ask for the mean of our dependent
variable using mean(bpm) then it will call the standard
deviation sd and ask for the standard deviation of our
dependent variable using sd(bpm)
Use the code above again, but just remove the group_by()
line of code. See example below:
summary_by_group = mydata %>%
summarise(mean = mean(bpm), sd = sd(bpm))
install.packages("rstatix")
library(tidyverse)
library(rstatix)
NULL1 = t.test(NULL2 ~ NULL3, mydata, var.equal = FALSE)
Change NULL1 to the name of the object to store the
t-test results.
Change NULL2 to the name of the dependent variable.
Change NULL3 to the name of the independent
variable.
See the example below:
t_test = t.test(bpm ~ drink, mydata, var.equal = FALSE)
print(t_test) #this will print the t-test results to the console.
Note: this requires the ‘rstatix’ package.
mydata %>%
cohens_d(NULL1 ~ NULL2, var.equal = FALSE)
Change NULL1 to the name of the dependent variable.
Change NULL2 to the name of the independent
variable.
See the example below:
mydata %>%
cohens_d(bpm ~ drink, var.equal = FALSE)
hist(mydata$bpm[mydata$NULL1 == "NULL2"]) #creates histogram for a variable
hist(mydata$bpm[mydata$drink == "water"]) #An example of how this code works
Change NULL1 to the name of the variable name.
Change NULL2 to the name of the condition/group you want
to plot.
shapiro.test(mydata$bpm[mydata$NULL1 == "NULL2"]) #creates histogram for a variable
shapiro.test(mydata$bpm[mydata$drink == "water"]) #An example of how this code works
Change NULL1 to the variable name.
Change NULL2 to the name of the condition/group you want
to plot.
install.packages("rstatix")
library(tidyverse)
library(rstatix)
Make sure that the data set is in long format. If it isn’t, you can use the code below to change it. Ensuring your data is in the correct format is part of data wrangling. The below code will work for a simple wide format data set with 2 columns of data (excluding ID)
mydata_long = mydata %>%
pivot_longer(cols = c(NULL1, NULL2),
names_to = "NULL3",
values_to = "NULL4")
Change NULL1 and ‘NULL2’ to your groups/conditions.
Change NULL3 to the name of the variable for the
groups/conditions.
Change ‘NULL4’ to whatever the values are (e.g., the dependent variable name usually)
See the example below:
mydata_long = mydata %>%
pivot_longer(cols = c(redcow, redcowX),
names_to = "drink",
values_to = "attention")
The above code will create a new data set called mydata_long, which
will contail all the same values but now in a long format. This uses the
pivot_longer() function.
NULL1 = t.test(NULL2 ~ NULL3, mydata, paired = TRUE)
Change NULL1 to the name of the object to store the
t-test results.
Change NULL2 to the name of the dependent variable.
Change NULL3 to the name of the independent
variable.
See the example below:
redcow_t = t.test(attention ~ drink, mydata_long, paired = TRUE)
print(redcow_t) #this will print the t-test results to the console.
Note: this requires the ‘rstatix’ package.
mydata %>%
cohens_d(NULL1 ~ NULL2, paired = TRUE)
Change NULL1 to the name of the dependent variable.
Change NULL2 to the name of the independent
variable.
See the example below:
mydata_long %>%
cohens_d(attention ~ drink, paired = TRUE)
Remember we need to check normality using the difference score. First we need to calculate this.
diff_score = mydata$NULL1 - mydata$NULL2
diff_score = mydata$redcow - mydata$redcowX #An example of this code
Change NULL1 to the name of the first
condition/group.
Change NULL2 to the name of the second
condition/group.
We will now use this diff_score for the steps below:
hist(diff_score) #creates histogram for the diff_score
shapiro.test(diff_score) #runs Shapiro test for diff_score
t.test(mydata_choc$NULL1, mu = NULL2)
Change NULL1 to the variable name.
Change NULL2 to the reference value.
See example below:
t.test(mydata_choc$weight, mu = 30)
#for webpage:
output:
html_document:
theme: flatly
highlight: haddock
toc: true
toc_float:
collapsed: false
smooth_scroll: false
Additional code for workshop sheets. workshop 2 t-tests exercise 6
ggplot(mydata, aes(x = drink, y = bpm)) +
geom_boxplot() +
labs(title = "Beats Per Minute for RedCow vs Water",
x = "Drink",
y = "Beats Per Minute (BPM") +
theme_classic()
Boxplot for workshop 3 exercise 2
ggplot(mydata_long, aes(x = drink, y = attention, fill = drink)) +
geom_boxplot() +
theme_classic()
workshop 3 optional exercises.
ggplot(mydata_long, aes(x = drink, y = attention, fill = drink)) +
geom_boxplot() +
labs(title = "Attention Score: RedCow vs RedCowX",
x = "Condition",
y = "Attention Score") +
theme_classic() +
scale_fill_manual(values = c("grey", "grey"))
#line by line explanation
#line36: call gpplot, use your datafile, aes(x = x axis variable, y = y axis variable, fill = condition/group variable)
#line37: as for a boxplot using geom_bolxplot()
#lines38-40: add you title and axis labels
#line41: pick a theme for how the figure looks
#line42: pick which colours to fill the boxplot.
ggplot(mydata_long, aes(x = drink, y = attention)) +
geom_bar(stat = "summary", fun = "mean", fill = "lightgrey", width = 0.7) +
stat_summary(fun.data = mean_sdl, fun.args = list(mult = 1), geom = "errorbar", width = 0.2) +
labs(title = "Average Attention Score with SD: RedCow vs RedCowX",
x = "Condition",
y = "Mean Attention Score") +
theme_classic()