In this WPA, you will analyze data from another fake study. In this fake study the researchers were interested in whether playing video games had cognitive benefits compared to other leisure activities. In the study, 90 University students were asked to do one of 3 leisure activities for 1 hour a day for the next month. 30 participants were asked to play visio games, 30 to read and 30 to juggle. At the end of the month each participant did 3 cognitive tests, a problem solving test (logic) and a reflex/response test (reflex) and a written comprehension test (comprehension).

The data are located in a tab-delimited text file at https://www.dropbox.com/s/h6gcdkskgs9nnb4/leisure.txt?dl=1. Here is how the first few rows of the data should look:

head(leisure)
##   id age gender activity logic reflex comprehension
## 1  1  26      m  reading    88   13.7            72
## 2  2  31      m  reading    85   11.8            83
## 3  3  38      m  reading    82    5.8            67
## 4  4  24      m  reading   102   18.0            66
## 5  5  30      f  reading    48   14.0            62
## 6  6  31      m  reading    61   14.1            58

Datafile description

The data file has 90 rows and 7 columns. Here are the columns

Data loading and preparation

  1. Open a new script and enter your name, date, and the wpa number at the top. remember to set the working directory like previous weeks. Save the script in the R folder in as wpa_9_LASTFIRST.R, where LAST and FIRST are your last and first names.

  2. The data are stored in a tab–delimited text file located at https://www.dropbox.com/s/h6gcdkskgs9nnb4/leisure.txt?dl=1. Using read.table() load this data into R as a new object called leisure

Understand the data

  1. Look at the first few rows of the dataframe with the head() function to make sure it loaded correctly.

  2. Using the str() function, look at the structure of the dataframe to make sure everything looks ok

Basic Custom Functions

Before preceding, if you haven’t already look at the help file for function do so (help("function"))

  1. Write a function called feed.me() that takes a string food as an argument, and returns the sentence “I love to eat ___“. Try your function by running feed.me("apples").

Press code for a hint:

feed.me <- function(___) {
  
  output <- paste0("I love to eat ", ___)
  
  print(___)
}
  1. Without using the mean() function, calculate the mean of the vector vec.1<- seq(1, 100, 5). (Hint: use sum and length)

  2. Write a function called my.mean() that takes a vector x as an argument, and returns the mean of the vector x. Use your code for Question 6 as your starting point. Test it on the vector from Question 6.

Press code for a hint:

my.mean <- function(___) {
  
  result <- sum(___) / length(___)
  
  return(result)
  
}
  1. Try your my.mean() function to calculate the mean ‘logic’ rating of participants in the leisure dataset and compare the result to the built-in mean() function to make sure you get the same result!

Basic Loops

  1. Create a loop that prints the squares of integers from 1 to 10:

Press code for a hint:

for(i in ___) {
  
  square.i <- ___
  
  print(square.i)
  
}
  1. Modify the previous code so that it saves the squared integers as a vector called squares. You’ll need to pre-create a vector, and use indexing to update it.

From Manual code to Loops to Functions

In this section you are going to start with a basic task you want to perform. In this case we want to create a copy of our dataset where all the test variables (logic, reflex and comprehension) have all been normalized. We will us this task firstly to demonstrate the benefit of using loops, then the benefit of using custom functions.

  1. Firstly create a copy of the data.frame leisure. Call this copy zleisure.manual.

  2. Using indexing of the column name, normalise the logic column in your new dataset (do not use $ to call the column). Do not create a new column, just overwrite the logic column with the normalised zcores. Remember that to normalse a score, also called z-transforming it, you first subtract the mean score from the individual scores, then divide by the standard deviation.

Press code to check your answer:

zleisure.manual[,"logic"]<- (leisure[,"logic"] - mean(leisure[,"logic"]) )/sd(leisure[,"logic"])

#or
mean.score<- mean(leisure[,"logic"])
sd.score<- sd(leisure[,"logic"])
zleisure.manual[,"logic"]<- (leisure[,"logic"] -mean.score)/sd.score
  1. Use the same method to normalise the other test columns; reflex, and comprehension.

Press code to check your answer:

zleisure.manual[,"reflex"]<- (leisure[,"reflex"] - mean(leisure[,"reflex"]) )/sd(leisure[,"reflex"])
zleisure.manual[,"comprehension"]<- (leisure[,"comprehension"] - mean(leisure[,"comprehension"]) )/sd(leisure[,"comprehension"])

Use a loop

  1. Create another copy of the leisure data.frame called zleisure.loop. We will perform the same normalisation on this data.frame using a simple loop.

  2. First create a loop that loops through each of the test column names (i.e. logic, reflex and comprehension). This will require a vector of the test column. For now have the loop print each column name as it goes through (using print). The output should look as follows.

## [1] "logic"
## [1] "reflex"
## [1] "comprehension"
  1. Now modify the loop from Question 15 so that each time it loops through it takes the named column and replaces the values in this column with the normalised scores. You already have a loop to go through the column names, and code that takes a column based on its name and repalces the values in that column with the normalzied values (see Qu 12/13). All you need to do is combine them.

Press code to check your answer:

for (i in c("logic", "reflex", "comprehension")){
  zleisure.loop[,i]<- (leisure[,i] - mean(leisure[,i]) )/sd(leisure[,i])
}
  1. Check that the logic column you normalised manually is the same as the logic column normalized by the loop.

  2. What if we decided that we also want to normalise the particpants ages. How easy is this to do in our loop?

Use a function

Now we have a loop that takes the data.frame zleisure.loop and for the 4 columns age, logic, reflex and comprehension it replaces the raw scores in that column with the z-transformed/normalized scores. This code seems pretty good, it only take 2 lines, and is much easier than typing out the code to normalize each column manually. It can also e easily expanded to include new columns.

However, what if we wanted to perform the same operation on another dataset?

  1. Load the fictional survey data from WPA6. Call it wpa6.df.
wpa6.df <- read.table(file = "https://www.dropbox.com/s/e9s0b7qnnt510vb/wpa6.txt?dl=1",
                      header = TRUE,         # There is a header row
                      sep = "\t")    
  1. This data.frame also has several test scores we might want to normalize, in particular the iq, logic and multitasking scores. Write a loop to normalise these scores (use your zleisure loop as a start point). Call the data.frame with the normalised scores zwpa6.df.

  2. There should be only 3 differences between the code in question 16, and the code in question 20. The only changes are the names of the data.frames (wpa6.df verse leisure; and zwpa6.df verse zleisure.loop) and the names of the columns. The actual operations being performed are identical, and the structure of the code is identical. When this happens its usually a pretty good sign that you should write a custom functon to perform this operation, rather than modifying your code each time. So now we are going to create a function, which we’ll call zbycolumn, which takes any data.frame and from that creates a new data.frame where the values in a subset of the columns are z.transformed. Since we want to be able to specify the data.frame and the columns that are to be z.transformed this function will need to accept two arguments, the data.frame (which we will call data) and the columns that are to be z.transformed (which we will call colunnames). Create an empty function that accepts these two arguments.

Press code to check your answer:

zbycolumn=function(data, columnnames){

  
}
  1. Now that you have a function that accepts the appropriate arguments, modify your code from exercise 20 so that it uses these arguments names rather than the wpa6.df and iq, logic etc.. Call the z.transformed version of the data, z.data

Press code to check your answer:

  z.data<-data
  for (i in columnnames){
    z.data[, i]<- (data[,i]-mean(data[,i])/sd(data[,i]))
  }
  1. Now place this code inside the empty function you created. remember to add a return argument.

Press code to check your answer:

zbycolumn=function(data, columnnames){
  z.data<-data
  for (i in columnnames){
    z.data[, i]<- (data[,i]-mean(data[,i])/sd(data[,i]))
  }
  return(z.data)
}
  1. Test your function by performing the normalisation on the leisure data.frame again. Call the output of the function zleisure.function and compare it to the manual or loop versions of the dataframe.

CHECKPOINT!

Functions and Figures

  1. Create a scatterplot of age and reflex of participants in the leisure datset. Cutomise it, and add gridlines and a regression line.

  2. Create a function called my.plot() that takes arguments x and y and returns a customised scatterplot with gridlines and a regression line:

Press code for a hint:

my.plot <- function(x, y) {
  
  plot(x = ___, 
       y = ___, 
       pch = ___,     # look at ?points to see the values of pch!
       col = ___)
  
  grid()   # Add gridlines
  
  # Add a regression line
  abline(lm(___ ~ ___), 
         col = ___)
}
  1. Now test your my.plot() function on the age and reflec of participants in the leisure dataset. It should look like the results of question 25.
my.plot(x=leisure$age, y=leisure$reflex)

More Loops

  1. Using the following template, create a loop that calculates (and prints) the mean logic score of participants from each activity group

Press code to see template:

for(activity.i in c("reading", "juggling", "gaming")) {
  
  data.i <- lesiure$logic[leisure$activity == ___]
  
  output <- paste0("The mean logic score of people who do ", ___, " is ", ___)
  
  print(___)
  
}
  1. Now create a histogram of comprehension scores for each activity using the following structure

Press code for template:

par(mfrow = c(1, 3)) # Set up 1 x 3 plotting grid

for(activity.i in c("reading", "juggling", "gaming")) {
  
  hist(lesiure$comprehension[leisure$activity == ___],
       main = ___,
       xlab = ___,
       col = ___
       )
  
}

par(mfrow = c(1, 1)) # Reset plotting grid

Function with Loops

  1. Create a loop that returns the sum of the vector 1:10. (i.e. Don’t use the existing sum function).

  2. Use this loop to create a function, called my.sum that returns the sum of any vector x. Test it on the logic ratings.

  3. Modify the function you created in question 31, to instead calculate the mean of a vector. Call this new function my.mean2 and compare it to both the my.mean function you created, and the in-built mean function. (Bonus: Can you also think of a way to do this without using the the length function)

Simulation!

  1. What is the probability of getting a significant p-value if the null hypothesis is true? Test this by conducting the following simulation:

Press code for template:

p.values <- rep(NA, ___)

for(i in ___) {

x <- rnorm(n = ___, mean = ___, sd = ___)

result <- t.test(___)$___

p.values[___] <- ___

}
  1. Create a function called psimulation with 4 arguments: sim: the number of simulations, samplesize: the sample size, mu.true: the true mean, and sd.true: the true standard deviation. Your function should repeat the simulation from the previous question with the given arguments. That is, it should calculate sim p-values testing whether samplesize samples from a normal distribution with mean = mu.true and standard deviation = sd.true is significantly different from 0. The function should return a vector of p-values.

Submit!

Save and email your ‘.R’ file wpa_9_LastFirst.R to me at ashleyjames.luckman@unibas.ch. Put the subject as WPA9-23496-02.