## Joining, by = "n"

The Woodward Reaching Task

Cast your mind back to the end of Lecture 2, where we talked about Amanda Woodward’s experiments on infants’ perception of goals. Remember that she familiarized infants to a hand or a rod reaching for one of two objects. Then, she showed them Test trials in which the positions of the objects had been switched. She showed that when infants had viewed a hand, they looked longer when that hand went to a new object, even if it traced the same path as before. However, if infants had viewed a rod, they looked longer if that rod went to the old object, tracing a different path as before.

From this, she inferred that infants perceive the hand as having a goal.

You can see examples of the test trials below.

Reaching Hand

Reaching Hand

Reaching Rod

Reaching Rod

In this exercise, we will redo Woodward’s analysis, processing the data and then graphing it. Note, though, that we are not using her original data – rather, I simulated some data to match the basic results of her study.

Download the data from Learn

You can download the datafile, called woodward_data.csv from this week’s Learn resource. Then, create an R script in the same directory as your datafile, and read it in following the code below. Process and graph the data by following along with this guide.

Load some R packages

You may need to install these packages before loading

library(ggplot2)
library(dplyr)

 Load and check the data

woodward_data = read.csv("woodward_datafile.csv")
summary(woodward_data)
##        n               TestEvent  Familiarization       LT        
##  Min.   : 1.00   NewGoal    :96   Hand:96         Min.   : 1.430  
##  1st Qu.: 8.75   NewLocation:96   Rod :96         1st Qu.: 7.149  
##  Median :16.50                                    Median :10.671  
##  Mean   :16.50                                    Mean   :11.971  
##  3rd Qu.:24.25                                    3rd Qu.:15.703  
##  Max.   :32.00                                    Max.   :30.000  
##   TrialNumber
##  Min.   :1   
##  1st Qu.:1   
##  Median :2   
##  Mean   :2   
##  3rd Qu.:3   
##  Max.   :3

Here, the different columns mean:

  • n = subject number (i.e., which participant is which)
  • TestEvent = Whether the test trial involves a new goal or a new location
  • Familiarization = Whether infants saw a hand reaching, or a rod
  • LT = Looking Time on each trial
  • TrialNumber = Participants took part in three trials in each condition

Examine the distribution of looking times

Create a histogram of looking times

hist(woodward_data$PutTheRightColumnNameHere)

You can see that the data is skewed.

Process the data for graphing

The graphs that you see in papers typically display by subject averages. What that means is, first, for each subject, you get their mean result in each condition, and then you find the overall mean in each condition, across the different subjects. So, remember that each subject here did 3 trials per condition. What we will do first is, for each subject, average across those three trials in each condition.

To do that we will use some dplyr functions, that make grouping and averaging easy.

subject_average = woodward_data %>%
   group_by(COLUMN_FOR_SUBJECT_NUMBER, TEST_EVENT_COLUMN,FAMILIARIZATION_COLUMN) %>% #<-- FILL THIS IN
   summarise(LT.mean = FUNCTION_FOR_MEAN(LT)) 

summary(subject_average)
hist(subject_average$COLUMN_FOR_LOOKING_TIME_MEAN)

You can see three interesting things here.

  • The pipe %>% operator takes the woodward_data variable and passes it to the function on the next line, which is called group_by.
  • The group_by function tells R that we want to group our data by which subject produced it, and which condition it came from, but we do not want to group by other variables, such as TrialNumber. That means we will end up ignoring trial number on the next line (which is what we want to do, because we want to average over it).
  • The summarise function says to create a new variable, LT.mean, which is equal to the mean of LT.
  • Because of the group_by call above, the mean will be caculated for each subject and condition, across the different trials that the subject was in.

Your result should look like this:

##        n               TestEvent  Familiarization    LT.mean      
##  Min.   : 1.00   NewGoal    :32   Hand:32         Min.   : 2.008  
##  1st Qu.: 8.75   NewLocation:32   Rod :32         1st Qu.: 6.919  
##  Median :16.50                                    Median :10.537  
##  Mean   :16.50                                    Mean   :11.971  
##  3rd Qu.:24.25                                    3rd Qu.:15.784  
##  Max.   :32.00                                    Max.   :30.000

## # A tibble: 64 x 4
## # Groups:   n, TestEvent [?]
##        n TestEvent   Familiarization LT.mean
##    <int> <fct>       <fct>             <dbl>
##  1     1 NewGoal     Rod                9.48
##  2     1 NewLocation Rod               14.1 
##  3     2 NewGoal     Rod               11.7 
##  4     2 NewLocation Rod               11.4 
##  5     3 NewGoal     Rod               20.9 
##  6     3 NewLocation Rod               23.8 
##  7     4 NewGoal     Rod               11.1 
##  8     4 NewLocation Rod               14.9 
##  9     5 NewGoal     Rod                2.90
## 10     5 NewLocation Rod                8.52
## # ... with 54 more rows

Now, let’s get the average for each condition

condition_average = subject_average %>%
  group_by(FAMILIARIZATION_COLUMN,TEST_EVENT_COLUMN) %>% #<-- FILL THIS IN
   summarise(LT.mean = FUNCTION_FOR_MEAN(LT.mean),
             LT.sd = FUNCTION_FOR_STANDARD_DEVIATION(LT.mean))

condition_average
## # A tibble: 4 x 4
## # Groups:   Familiarization [?]
##   Familiarization TestEvent   LT.Grand.Mean LT.sd
##   <fct>           <fct>               <dbl> <dbl>
## 1 Hand            NewGoal             15.9   6.71
## 2 Hand            NewLocation         12.8   7.24
## 3 Rod             NewGoal              8.22  4.92
## 4 Rod             NewLocation         10.9   5.02

 Graph the data

Now let’s graph the data, using GGPLOT2 as last time. Look back to the last exercise to see how to add the errorbars to your chart.

ggplot(condition_average, aes(x = WHAT_VARIABLE_ON_X_AXIS,
                              y = WHAT_VARIABLE_ON_Y_AXIS,
                              fill = WHAT_VARIABLE_DECIDES_COLORS_OF_BARS))+
  geom_col(position = "dodge")+
  geom_errorbar(aes(ymin = WHAT_VALUE, ymax = WHAT_VALUE), position = "dodge")

You can see that infants differentially look to the test trials, depending on what they were familiarized with.

Note also that the error bars here are very wide. That’s because we are plotting the standard deviations of the mean, rather than the standard errors of the mean.

If you like, you can try to figure out how to edit the code above to plot the standard error instead. Remember that the standard error of the data in a condition is the standard deviation divided by the square root of the number of participants in that condition (here, 16). See if you can figure out how to plot that statistic instead. You can always email me, or come to office hours, if you have any questions.