In this WPA, you will analyze data from another fake study. In this fake study the researchers were interested in whether playing video games had cognitive benefits compared to other leisure activities. In the study, 90 University students were asked to do one of 3 leisure activities for 1 hour a day for the next month. 30 participants were asked to play visio games, 30 to read and 30 to juggle. At the end of the month each participant did 3 cognitive tests, a problem solving test (logic) and a reflex/response test (reflex) and a written comprehension test (comprehension).
The data are located in a tab-delimited text file at https://www.dropbox.com/s/h6gcdkskgs9nnb4/leisure.txt?dl=1. Here is how the first few rows of the data should look:
head(leisure)
## id age gender activity logic reflex comprehension
## 1 1 26 m reading 88 13.7 72
## 2 2 31 m reading 85 11.8 83
## 3 3 38 m reading 82 5.8 67
## 4 4 24 m reading 102 18.0 66
## 5 5 30 f reading 48 14.0 62
## 6 6 31 m reading 61 14.1 58
The data file has 90 rows and 7 columns. Here are the columns
id: The participant ID
age: The age of the participant
gender: The gender of the particiant
activity: Which leisure activity the participant was assigned for the last month (“reading”, “juggling”, “gaming”)
logic: Score out of 120 on a problem solving task. Higher is better.
reflex: Score out of 25 on a reflex test. Higher indicates faster reflexes.
comprehension: Score out of 100 on a reading comprehension test. Higher is better.
Open a new script and enter your name, date, and the wpa number at the top. remember to set the working directory like previous weeks. Save the script in the R folder in as wpa_9_LASTFIRST.R, where LAST and FIRST are your last and first names.
The data are stored in a tab–delimited text file located at https://www.dropbox.com/s/h6gcdkskgs9nnb4/leisure.txt?dl=1. Using read.table() load this data into R as a new object called leisure
Look at the first few rows of the dataframe with the head() function to make sure it loaded correctly.
Using the str() function, look at the structure of the dataframe to make sure everything looks ok
Before preceding, if you haven’t already look at the help file for function do so (help("function"))
feed.me() that takes a string food as an argument, and returns the sentence “I love to eat ___“. Try your function by running feed.me("apples").Press code for a hint:
feed.me <- function(___) {
output <- paste0("I love to eat ", ___)
print(___)
}
Without using the mean() function, calculate the mean of the vector vec.1<- seq(1, 100, 5). (Hint: use sum and length)
Write a function called my.mean() that takes a vector x as an argument, and returns the mean of the vector x. Use your code for Question 6 as your starting point. Test it on the vector from Question 6.
Press code for a hint:
my.mean <- function(___) {
result <- sum(___) / length(___)
return(result)
}
my.mean() function to calculate the mean ‘logic’ rating of participants in the leisure dataset and compare the result to the built-in mean() function to make sure you get the same result!Press code for a hint:
for(i in ___) {
square.i <- ___
print(square.i)
}
squares. You’ll need to pre-create a vector, and use indexing to update it.In this section you are going to start with a basic task you want to perform. In this case we want to create a copy of our dataset where all the test variables (logic, reflex and comprehension) have all been normalized. We will us this task firstly to demonstrate the benefit of using loops, then the benefit of using custom functions.
Firstly create a copy of the data.frame leisure. Call this copy zleisure.manual.
Using indexing of the column name, normalise the logic column in your new dataset (do not use $ to call the column). Do not create a new column, just overwrite the logic column with the normalised zcores. Remember that to normalse a score, also called z-transforming it, you first subtract the mean score from the individual scores, then divide by the standard deviation.
Press code to check your answer:
zleisure.manual[,"logic"]<- (leisure[,"logic"] - mean(leisure[,"logic"]) )/sd(leisure[,"logic"])
#or
mean.score<- mean(leisure[,"logic"])
sd.score<- sd(leisure[,"logic"])
zleisure.manual[,"logic"]<- (leisure[,"logic"] -mean.score)/sd.score
reflex, and comprehension.Press code to check your answer:
zleisure.manual[,"reflex"]<- (leisure[,"reflex"] - mean(leisure[,"reflex"]) )/sd(leisure[,"reflex"])
zleisure.manual[,"comprehension"]<- (leisure[,"comprehension"] - mean(leisure[,"comprehension"]) )/sd(leisure[,"comprehension"])
Create another copy of the leisure data.frame called zleisure.loop. We will perform the same normalisation on this data.frame using a simple loop.
First create a loop that loops through each of the test column names (i.e. logic, reflex and comprehension). This will require a vector of the test column. For now have the loop print each column name as it goes through (using print). The output should look as follows.
## [1] "logic"
## [1] "reflex"
## [1] "comprehension"
Press code to check your answer:
for (i in c("logic", "reflex", "comprehension")){
zleisure.loop[,i]<- (leisure[,i] - mean(leisure[,i]) )/sd(leisure[,i])
}
Check that the logic column you normalised manually is the same as the logic column normalized by the loop.
What if we decided that we also want to normalise the particpants ages. How easy is this to do in our loop?
Now we have a loop that takes the data.frame zleisure.loop and for the 4 columns age, logic, reflex and comprehension it replaces the raw scores in that column with the z-transformed/normalized scores. This code seems pretty good, it only take 2 lines, and is much easier than typing out the code to normalize each column manually. It can also e easily expanded to include new columns.
However, what if we wanted to perform the same operation on another dataset?
wpa6.df.wpa6.df <- read.table(file = "https://www.dropbox.com/s/e9s0b7qnnt510vb/wpa6.txt?dl=1",
header = TRUE, # There is a header row
sep = "\t")
This data.frame also has several test scores we might want to normalize, in particular the iq, logic and multitasking scores. Write a loop to normalise these scores (use your zleisure loop as a start point). Call the data.frame with the normalised scores zwpa6.df.
There should be only 3 differences between the code in question 16, and the code in question 20. The only changes are the names of the data.frames (wpa6.df verse leisure; and zwpa6.df verse zleisure.loop) and the names of the columns. The actual operations being performed are identical, and the structure of the code is identical. When this happens its usually a pretty good sign that you should write a custom functon to perform this operation, rather than modifying your code each time. So now we are going to create a function, which we’ll call zbycolumn, which takes any data.frame and from that creates a new data.frame where the values in a subset of the columns are z.transformed. Since we want to be able to specify the data.frame and the columns that are to be z.transformed this function will need to accept two arguments, the data.frame (which we will call data) and the columns that are to be z.transformed (which we will call colunnames). Create an empty function that accepts these two arguments.
Press code to check your answer:
zbycolumn=function(data, columnnames){
}
wpa6.df and iq, logic etc.. Call the z.transformed version of the data, z.dataPress code to check your answer:
z.data<-data
for (i in columnnames){
z.data[, i]<- (data[,i]-mean(data[,i])/sd(data[,i]))
}
return argument.Press code to check your answer:
zbycolumn=function(data, columnnames){
z.data<-data
for (i in columnnames){
z.data[, i]<- (data[,i]-mean(data[,i])/sd(data[,i]))
}
return(z.data)
}
leisure data.frame again. Call the output of the function zleisure.function and compare it to the manual or loop versions of the dataframe.Create a scatterplot of age and reflex of participants in the leisure datset. Cutomise it, and add gridlines and a regression line.
Create a function called my.plot() that takes arguments x and y and returns a customised scatterplot with gridlines and a regression line:
Press code for a hint:
my.plot <- function(x, y) {
plot(x = ___,
y = ___,
pch = ___, # look at ?points to see the values of pch!
col = ___)
grid() # Add gridlines
# Add a regression line
abline(lm(___ ~ ___),
col = ___)
}
my.plot() function on the age and reflec of participants in the leisure dataset. It should look like the results of question 25.my.plot(x=leisure$age, y=leisure$reflex)
logic score of participants from each activity groupPress code to see template:
for(activity.i in c("reading", "juggling", "gaming")) {
data.i <- lesiure$logic[leisure$activity == ___]
output <- paste0("The mean logic score of people who do ", ___, " is ", ___)
print(___)
}
Press code for template:
par(mfrow = c(1, 3)) # Set up 1 x 3 plotting grid
for(activity.i in c("reading", "juggling", "gaming")) {
hist(lesiure$comprehension[leisure$activity == ___],
main = ___,
xlab = ___,
col = ___
)
}
par(mfrow = c(1, 1)) # Reset plotting grid
Create a loop that returns the sum of the vector 1:10. (i.e. Don’t use the existing sum function).
Use this loop to create a function, called my.sum that returns the sum of any vector x. Test it on the logic ratings.
Modify the function you created in question 31, to instead calculate the mean of a vector. Call this new function my.mean2 and compare it to both the my.mean function you created, and the in-built mean function. (Bonus: Can you also think of a way to do this without using the the length function)
p.values with 100 NA values.p.values.p.values with 100 p-values.p.values and calculate the proportion of p-values that are significant at the .05 level.Press code for template:
p.values <- rep(NA, ___)
for(i in ___) {
x <- rnorm(n = ___, mean = ___, sd = ___)
result <- t.test(___)$___
p.values[___] <- ___
}
psimulation with 4 arguments: sim: the number of simulations, samplesize: the sample size, mu.true: the true mean, and sd.true: the true standard deviation. Your function should repeat the simulation from the previous question with the given arguments. That is, it should calculate sim p-values testing whether samplesize samples from a normal distribution with mean = mu.true and standard deviation = sd.true is significantly different from 0. The function should return a vector of p-values.Save and email your ‘.R’ file wpa_9_LastFirst.R to me at ashleyjames.luckman@unibas.ch. Put the subject as WPA9-23496-02.